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WO 00/60869 PCT/US00/09463 
PERSPECTIVE-CORRECTED VIDEO PRESENTATIONS 

Related References 

This application claims the benefit of U.S. Provisional Application No. 60/128,613, filed 
on April 8, 1999, which is hereby entirely incorporated herein by reference. The following 
5 disclosures are filed concurrently herewith and are. expressly incorporated by reference for any 
essential material. 

1. . U.S. Patent Application Serial No. , (Attorney Docket No. 01096.86946) entitled 
"Remote Platform for Camera". 

2. U.S. Patent Application Serial No. , (Attorney Docket No. 01096.86942) entitled 
10 "Virtual Theater". 

3. U.S. Patent Application Serial No. , (Attorney Docket No. 01096.86949) entitled 
"Method and Apparatus for Providing Virtual Processing Effects for Wide-Angle Video 
Images". 

Technical Field 

15 In general, the present invention relates to capturing and viewing images. More 

particularly, the present invention relates to capturing and viewing spherical images in a 
perspective-corrected presentation. 

Background Of the Invention 

With the advent of television and computers, man has pursued the goal of tele-presence: 

20 the perception that one is at another place. Television permits a limited form of tele-presence 
through the use of a single view of a television screen. However, one is continually confronted 
with the fact that the view provided on a television screen is controlled by another, primarily the 
camera operator. 

Using an example of a roller coaster, a television presentation of a roller coaster ride 
25 would generally start with a rider's view. However, the user cannot control the direction of 
viewing so as to see, for example, the next curve in the track. Accordingly, users merely see 
what a camera operator intends for them to see at a given location. 
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Computer systems, through different modeling techniques, attempt to provide a virtual 
environment to system users. Despite advances in computing power and rendering techniques 
permitting multi-faceted polygonal representation of objects and three-dimensional interaction 
with the objects (see, for example, first person video games including Half-life and Unreal), 
users remain wanting a more realistic experience. So, using the roller coaster example above, a 
computer system may display the roller coaster in a rendered environment, in which a user may 
look in various directions while riding the roller coaster. However, the level of detail is 
dependent on the processing power of the user's computer as each polygon must be separately 
computed for distance from the user and rendered in accordance with lighting and other options. 
Even with a computer with significant processing power, one is left with the unmistakable 
feeling that one is viewing a non-real environment. 

Summary 

The present invention discloses an immersive video capturing and viewing system. 
Through the capture of at least two images, the system allows for a video data set of an 
environment be captured. The immersive presentation may be streamed or stored for later 
viewing. Various implementation are described here including surveillance, pay-per-view, 
authoring, 3D modeling and texture mapping, and related implementations. 

In one embodiment, the present invention provides pay-per-view interaction with 
immersive videos. The present invention provides for the generation of a wide angle image at 
one location and for the transmission of a signal corresponding to that image to another location, 
with the received transmission being processed so as to provide a pay-per-view perspective- 
corrected view of any selected portion of that image at the other location. The present invention 
provides for the generation of a wide angle image at one location and for the transmission of a 
signal corresponding to that image to another location, with the received transmission being 
processed so as to provide at a plurality of stations a perspective-corrected view of any selected 
portion of that image at any pre-selected positioning with respect to the event being viewed, with 
each station/user selecting a desired perspective-corrected view that may be varied according to a 
predetermined pay-per-view scheme. 

The present invention provides for the generation of a wide angle image at one location 
and for the transmission of a signal corresponding to that image to a plurality of other locations, 
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with the received transmission at each location being processed in accordance with pay-per-view 
user selections so as to provide a perspective-corrected view of any selected portion of that 
image, with the selected portion being selected at each of the plurality of other locations. 

Accordingly, the present invention provides an apparatus that can provide, on a pay-per- 
5 view basis, an image of any portion of the viewing space within a selected field-of-view without 
moving the apparatus to another location, and then electronically correct the image for visual 
distortions of the view. 

The present invention provides for the pay-per-view user to select the degree of 
magnification or scaling desired for the image (zooming in and out) electronically, and where 
10 desired, to provide multiple images on a plurality of windows with different orientations and 
magnification simultaneously from a single input spherical video image. 

A pay-per-view system may produce the equivalent of pan, tilt, zoom, and rotation within 
a selected view, transforming a portion of the video image based upon user or pre-selected 
commands, and producing one or more output images that are in correct perspective for human 

15 viewing in accordance with the user pay-per-view selections. In one embodiment, the incoming 
image is produced by a fisheye lens that has a wide angle field-of-view. This image is captured 
into an electronic memory buffer. A portion of the captured image, either in real time or as 
prerecorded, containing a region-of-interest is transformed into a perspective corrected image by 
an image processing computer. The image processing computer provides mapping of the image 

20 region-of-interest into a corrected image using, for example, an orthogonal set of transformation 
algorithms. The original image may comprise a data set comprising all effective information 
captured from a point in space. Allowance is made for the platform (tripod, remote control robot, 
stalk supporting the lens structure, and the like). Further, the data set may be modified by 
eliminating the top and bottom portions as, in some instances, these regions do not contain 

25 unique material (for example, when straight vertical only looks at a clear sky). The data set may 
be stored in a variety of formats including equirectangular, spherical (as shown, for example, in 
U.S. Patent No. 5,684,937, 5,903,782, and 5,936,630 to Oxaal), cubic, bi-hemispherical, 
panoramic, and other representations as are known in the art. The conversion from one 
representation to others is within the scope of one of ordinary skill in the art. 
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The viewing orientation is designed by a command signal generated by either a human 
operator or computerized input. The transformed image is deposited in an electronic memory 
buffer where it is then manipulated to produce the output image or images as requested by the 
command signal. 

5 The present invention may utilize a lens supporting structure which provides alignment of 

for an image capture means wherein the alignment produces captured images that are aligned for 
easy seaming together of the captured images to form spherical images that are used to produce 
multiple streams for providing viewing of an event at different positions/locations by a pay-per 
view user. 

10 A video apparatus with that camera having at least two wide-angle lenses, such as a fish- 

eye lens with field-of-views of at least 180 degrees, produces electrical signals that correspond to 
images captured by the lenses. It is appreciated that three 120 or more degree lenses may be used 
(for example, three 180 degree lenses producing an overlap of 60 degrees per lens). Further, four 
90 or more degree lenses may be used as well. 

15 These electrical signals, which are distorted because of the curvature of the lens, are 

input to apparatus, digitized, and seamed together into an immersive video. Despite some 
portions being blocked by a supporting platform (for example, as described in concurrently filed 
U.S. Serial No. (01096.86946) entitled "Remote Platform for Camera", whose contents are 
incorporated herein, the resulting immersive video provides a user with the ability to navigate to 

20 a desired viewing location while the video is playing. 

The immersive video may have portions After creating each spherical video image, the 
apparatus may transmit a portion representing a view selected by the pay-per-view user, or 
alternatively, may compress each image using standard data compression techniques and then 
store the images in a magnetic medium, such as a hard disk, for display at real time video rates or 

25 send compressed images to the user, for example over a telephone line. 

At each pay-for-play location where viewing is desired, there is apparatus for receiving 
the transmitted signal. In the case of the telephone line transmission, "decompression" apparatus 
is included as a portion of the receiver. The received signal is then digitized. A selected portion 
of the multi-stream transmission of the pay-for-play view of the event is selected by the pay-for- 
30 play viewer and a selected portion of the digitized signal, as selected by operator commands, is 
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transformed using the algorithms of the above-cited U.S. Pat. No. 5,185,667 into a perspective- 
corrected view corresponding to that selected portion. This selection by operator commands 
includes options of pan, tilt, and rotation, as well as degrees of magnification. 

Command signals are sent by the pay-for-play user to at least a first transform unit to 
5 select the portion of the multi-stream transmission of the viewing event that is desired to be seen 
by the user. 

These and other objects of the present invention will become apparent upon consideration 
of the drawings hereinafter in combination with a complete description thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 Figure 1 shows a block diagram of a single lens image capture system in accordance with 

embodiments of the present invention. 

Figure 2 shows a block diagram of a multiple lens image capture in accordance with 
embodiments of the present invention. 

Figure 3 shows a tele-centrically-opposed image capture system in accordance with 
15 embodiments of the present invention. 

Figure 4 shows an alternative image capture system in accordance with embodiments of 
the present invention. 

Figure 5 shows yet another alternative image capture system in accordance with 
embodiments of the present invention. 

20 Figure 6 shows a developing process flow in accordance with embodiments of the present 

invention. 

Figure 7 shows various image capture systems and distribution systems in accordance 
with embodiments of the present invention. 

Figure 8 shows various seaming systems in accordance with embodiments of the present 
25 invention. 

Figure 9 shows distribution systems in accordance with embodiments of the present 
invention. 
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Figure 10 shows a file format in accordance with embodiments of the present invention. 

Figure 11 shows alternative image representation data structures in accordance with 
embodiments of the present invention. 

Figure 12 shows a temporal hotspot actuation process in accordance with embodiments of 
5 the present invention. 

Figure 13 shows a pay-per-view process in accordance with embodiments of the present 
invention. 

Figure 14 shows a pay-per-view system in accordance with embodiments of the present 
invention. 

10 Figure 15 shows another pay-per-view system in accordance with embodiments of the 

present invention. 

Figure 16 shows yet another pay-per-view system in accordance with embodiments of the 
present invention. 

Figure 17 shows a stadium with image capture points in accordance with embodiments of 
15 the present invention. 

Figure 18 provides a representation of the images captured at the image capture points of 
Figure 17 in accordance with embodiments of the present invention. 

Figure 19 shows the image capture perspectives with additional perspectives in 
accordance with embodiments of the present invention. 
20 Figure 20 shows another perspective of the system of Figure 19 with a distribution 

system in accordance with embodiments of the present invention. 

Figure 21 shows an effective field of view concentrating on a playing field in accordance 
with embodiments of the present invention. 

Figure 22 shows a system for overlaying generated images on an immersive presentation 
25 stream in accordance with embodiments of the present invention. 

Figure 23 shows an image processing system for replacing elements in accordance with 
embodiments of the present invention. 
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Figure 24 shows a boxing ring in accordance with embodiments of the present invention. 
Figure 25 shows a pay-per-view system in accordance with embodiments of the present 
invention. 

Figure 26 shows various image capture systems in accordance with embodiments of the 
5 present invention. 

Figure 27 shows image analysis points as captured by the systems of Figure 26 in 
accordance with embodiments of the present invention. 

Figure 28 shows various images as captured with the systems of Figure 26 in accordance 
with embodiments of the present invention. 
10 Figure 29 shows a laser range finder with an immersive lens combination in accordance 

with embodiments of the present invention. 

Figure 30 shows a three-dimensional model extraction system in accordance with 
embodiments of the present invention. 

Figures 31A-C show various implementations of the system in applications in accordance 
15 with embodiments of the present invention. 

Detailed Description 

The system relates to an immersive video capture and presentation system. In capturing 
and presenting immersive video presentations, the system, through the use of 1 80 or more degree 
fish eye lenses, captures 360 degrees of information. As will be appreciated from the description, 

20 other lens combinations may be used as well including cameras equipped with lenses of less than 
180 degrees fields of view and capturing separate images for seaming. Further, not all data needs 
to be captured to accomplish the goals of the present invention. Specifically, panoramic data sets 
may be used, as not having a top or bottom portion (e.g., top or bottom 20 degrees). Moreover, 
data sets of more than 360 degrees may be used (for example, 370 (from two 185 degree lenses) 

25 or 540 degrees (from three 180 degree lenses) for additional image capture. Accordingly, for 
simplicity, reference is made to 360 degree views or spherical data sets. However, it is readily 
appreciated that alternative data sets or videos with different amounts of coverage (greater or less 
than) may be used equally as well. 
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It is appreciated that all methods may be implemented in computer readable mediums in 
addition to hardware. 

Figure 1 shows a block diagram of a single lens image capture system in accordance with 
embodiments of the present invention. Figure 1 is a block diagram of one embodiment of an 
5 immersive video image capture method using a single fisheye lens capture system for use with 
the present invention. The system includes a fish-eye lens (which may be greater or less than 180 
degrees), an image capture sensor and camera electronics, a compression interface (permitting 
compression to different standards including MPEG, MJPG, and even not compressing the file), 
and a computer system for recording and storing the resulting image. Also shown in Figure 1 is a 
10 resulting circular image as captured by the lens. The image capture system as shown in Figure 1 
captures images and outputs the video stream to be handled by the compression system. 

Figure 2 shows a block diagram of a multiple lens image capture in accordance with 
embodiments of the present invention. Figure 2 shows two back to back camera systems (as 
shown in U.S. Patent No. 6,002,430, which is incorporated by reference), a sensor interface, a 
15 seaming interface, a compression interface, and a communication interface for transmitting the 
received video signal onto a communications system. The received transmission is then stored in 
a capture/storage system. 

Figure 3 shows a tele-centrically-opposed image capture system in accordance with 
embodiments of the present invention. Figure 3 details a first objective lens 301 and a second 
20 objective lens 302. Both objective lenses transmit their received images to a prism mirror 303 
which reflects the image from objective lens 301 up and the image from objective lens 302 
down. Supplemental optics 304 and 305 may then be used to form the images on sensors 306 and 
307. An advantage to having tele-centrically opposed optics as shown in Figure 3 is that the 
linear distance between lens 301 and lens 302 may be minimized. This minimization attempts to 
. 25 eliminate non-captured regions of an environment due to the separation of the lenses. The 
resulting images are then sent to sensor interfaces 308, 309 as controlled by camera dual sensor 
control 301. Camera dual sensor interface 310 may receive control inputs addressing irising 
among the two optical paths, color matching between the two images (due to, for example, color 
variations in the optics 301, 302, 304, 305, and in the sensors 306, 307), and other processing as 
30 further defined in Figure 11 and in U.S. Serial No. (01096,86949), referenced above. Both image 



8 



WO 00/60869 



PCT/USOO/09463 



streams are input into a seaming interface where the two images are aligned. The alignment may 
take the form of aligning the first pair, or sets of pairs and applying the correction to all 
remaining images, or at least the images contained in a captured video scene. 

The seamed video is input into compression system 3 1 2 where the video may be 
5 compressed for easier transmission. Next, the compressed video signal is input to communication 
interface block 313 where the video is prepared for transmission. The video is next transmitted 
via communication interface 314 to a communications network. Receiving the video from the 
communications network is an image capture system (for example, a user's computer) 315. A 
user specifies 316 a selected portion or portions of the video signal. The portions may comprise 

10 directions of view (as detailed in U.S. Patent No. 5,185,667, whose contents are expressly 
incorporated herein). The selected portion or portions may originate with a mouse, joystick, 
positional sensors on a chair, and the like as are known in the art and further including a head 
mounted display with a tracking system. The system further includes a storage 317 (which may 
include a disk drive, RAM, ROM, tape storage, and the like). Finally, a display is provided as 

15 319. The display may take the shape of the display systems as embodied in U.S. Serial No. 
(01096.86942). 

Figure 4 shows an alternative image capture system in accordance with embodiments of 
the present invention. Similar to that of Figure 3, Figure 4 shows an image capture system with a 
mirror prism directing images from the objective lenses to a common sensor interface. The 
20 sensor interface 401 may be a single sensor or a dual sensor. Other elements are similar to those 
of Figure 3. 

Figure 5 shows yet another alternative image capture system in accordance with 
embodiments of the present invention. Figure 5 shows an embodiment similar to that of Figure 4 
but using light sensitive film. In this embodiment, different film sizes (35 mm, 16 mm, super 

25 35mm, super 16mm and the like) may be used to capture the image or images from the optics. 
Figure 5 shows different orientations for storing images on the film. In particular, the images 
may be arranged horizontally, vertically, etc. An advantage of the super 16 mm and super 35 mm 
film formats is that the approximate a 2:1 aspect ratio. With this ratio, two circular images from 
the optics may be captured next to each other, thereby maximizing the amount of a frame of film 

30 used. 
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Figure 6 shows a process flow for developing and processing the film from the film plane 
into an immersive movie. The film 601 is developed in developer 602. The developed film 603 is 
scanned by scanner 604 and the result is stored in scanner 605. The storage may also comprise a 
disk, diskette, tape, RAM or ROM 606. The images are seamed together and melded into an 
5 immersive presentation in 607. Finally, the output is stored in storage 608 

Figure 7 shows various image capture systems and distribution systems in accordance 
with embodiments of the present invention. Capture system cameras 701 may represent 180 
degree fish eye lenses, super 180 (233 degrees and greater) fish eye lenses, the various back to 
back image capture devices shown above, digital image capture, and film capture. The result of 

10 the image capture in 701 may be sent to a storage 702 for processing by authoring tools 703 and 
later storage 704, or may be streamed live 705 to a delivery/distribution system. The 
communication link 706 distributes the stored information and sends it at least one file server 
707 (which may comprise a file server for a web site) so as to distribute the information over a 
network 709. The distribution system may comprise a unicast transmission or a multicast 708 as 

15 these techniques of distributing data files are known in the art. The resulting presentations are 
received by network interface devices 710 and used by users. The network interface devices may 
include personal computers, set-top boxes for cable systems, game consoles, and the like. A user 
may select at least one portion of the resulting presentation with the control signals being sent to 
the network interface device to render a perspective correct view for a user. 

20 Instead of transmitting the presentation over a network (e.g., the Internet), the 

presentation may be separately authored or mastered 71 1 and placed in a fixed medium 712 (that 
may include DVDs, CD-ROMs, CD-Videos, tapes, and in solid state storage (e.g., Memory 
Sticks by the Sony Coiporation). 

Figure 8 shows various seaming systems in accordance with embodiments of the present 
25 invention. Input images may comprise two or more separate images 801A or combined images 
with two spherical images on them 80 IB. 801 A and 80 IB show an example where lenses of 
greater than 180 degrees were used to capture an environment. Accordingly, an image boundary 
is shown and a 180-degree boundary is shown on each image. By defining the 180 degree 
boundary, one is able to more easily seam images as one would know where overlapping 
30 portions of the image being and end. Further, the resolution of the resulting image may depend 
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on the sampling method used to create the representations of 801 A and 801B. The boundaries of 
the image are detected in system 802. The system may also find the radius of the image circle. In 
the case of offsets or warping to an ellipse, major and minor radii may be found. Further, from 
these values, the center of the image may be found (h,v). Next, image enhancement methods may 

5 be applied in step 803 if needed. The enhancement methods may include radial filtering (to 
remove brightness shifts as one moves from the center of the lens), color balancing (to account 
for color shifts due to lens color variations or sensor variations, for example, having a hot or cold 
gamma), flare removal (to eliminate lens flare), anti-aliasing, scaling, filtering, and other 
enhancements. Next, the boundaries of the images are matched 804 where one may filter or 

10 blend or match seams along the boundaries of the images. Next, the images are brought into 
registration through the registration alignment process 805. These and related techniques may be 
found in co-pending PCT Reference No. PCT/US99/07667 filed on April 8, 1999, whose 
disclosure is incorporated by reference. 

Finally, the seaming and alignment applied in step 805 is applied to the remaining video 
15 sequences, resulting in the immersive image output 806. 

Figure 9 shows distribution systems in accordance with embodiments of the present 
invention. Immersive video sequences are received at a network interface 905 (from lens system 
901 and combination interfaces 902 or storage 903 and video server 904). The network interface 
outputs the image via a satellite link 906 to viewers (including set-top boxes, personal 
20 computers, and the like). Alternatively, the system may broadcast the immersive video 
presentation via a digital television broadcast 907 to receiver (comprising, for example, set-top 
boxes, personal computers, and the like). Moreover, the immersive video experience may be 
transmitted via ATM, broadband, the Internet, and the like 908. The receiving devices may be 
personal computers, set-top boxes and the like. 

25 Likewise, global positioning system data may be captured simultaneously with the image 

or by pre-recording or post-recording the location data as is known from the surveying art. The 
object is to record the precise latitude and longitude global coordinates of each image as it is 
captured. Having such data, one can easily associate front and back hemispheres with one 
another for the same image set (especially when considered with time and date data). The path of 

30 image taking from one picture to the next can be permanently recorded and used, for example, to 
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reconstruct a picture tour taken by a photographer when considered with the date and time of day 
stamps. 

Other data may be automatically recorded in memory as well (not shown) including 
names of human subjects, brief description of the scene, temperature, humidity, wind velocity, 
5 altitude and other environmental factors. These auxiliary digital data files associated with each 
image captured would only be limited in type by the provision of appropriate sensing and/or 
measuring equipment and the access to digital memory at the time of image capture. One or 
more or all of these capabilities may be built into wide angle digital camera system. 

Figure 10 shows a file format in accordance with embodiments of the present invention. 

10 The file format comprises at data structure as including an immersive image stream 1001 and an 
accompanying audio stream 1002. Here, immersive image stream 1001 is shown with two scenes 
1001A and 1001B. In one embodiment, the audio stream is spatially encoded. In another 
embodiment, the audio portion is not so encoded. By encoding the audio stream, the user is 
presented with a more immersive experience. However, by not encoding the stream, the amount 

15 of non-image formation transmitted is reduced. The technique for spatial encoding is described 
in greater detail in U.S. Serial No. (01096.86942) entitled "Virtual Theater", filed herewith and 
incoiporated by reference. To minimize data content and attempt to increase image transfer rates, 
one embodiment only uses the combination of the image stream and the audio stream to provide 
the immersive experience. However, alternate embodiments permit the addition of additional 

20 information that enables tracking of where the immersive image was captured (location 
information 1003 including, for example, GPS information), enables the immersive experience to 
have a predefined navigation (auto navigation stream 1004), enables linking between immersive 
streams (linked hot spot stream 1005), enables additional information to be overlaid onto the 
immersive video stream (video overlay stream 1006), enables sprite information to be encoded 

25 (sprite stream 1007), enables visual effects to be combined on the image stream (visual effects 
stream 1008 which may incorporate transitions between scenes), enable position feedback 
information to be recorded (position feedback stream 1009), enables timing (time code 1010), 
and enhanced music to be added (MIDI stream 1011). It is appreciated that various ones of the 
data fonnat fields may be added and removed as needed to increase or decrease the bandwidth 

30 consumed and file size of the immersive video presentation. 
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Figure 10 also shows an embodiment where the pay-per-view embodiment of the present 
invention uses the described data format. For example, the pay-per-view embodiment allows a 
user to select a location for viewing an event, such as for example, the 20 yard line for a football 
game, and the delivery system isolates the data needed from the spherical video image that will 
5 provide a view from the selected location and sends it to the pay-for-view event control 
transceiver 2302 for viewing on a display 2304 by the user. The user may select a plurality of 
locations for viewing that may be delivered to a plurality of windows on his display. Also, the 
user may adjust a view using pan, tilt, rotate, and zoom. In addition, the viewing location may be 
associated with an object that is moving in the event. For example, by selecting the basketball as 

10 the location of the view, the display will place the basketball at or near the center of the window 
and will track the movement of the basketball, i.e., the window will show the basketball at or 
near the center of the screen and the camera will follow the movement of the basketball by 
shifting the display to maintain the basketball at or near the center of the screen as the basketball 
game proceeds. In a sport such as golf, the display maybe adjusted to zoom back to encompass a 

15 large area and place a visible screen marker on the golf ball, and where selected by the user, may 
leave a path such as is seen with "mouse tails" on a computer screen when the mouse is moved, 
to facilitate the user's viewing of the path of the golf ball. 

In short, a pay-per-view system may transmit the entire immersive presentation and let 
the user determine the direction of view and, alternatively, the system may transmit only a pre- 
20 selected portion of the immersive presentation for passive viewing by a consumer. Further, it is 
appreciated that a combination of both may be used in practice of the invention without undue 
experimentation. 

Figure 11 shows alternative image representation data structures in accordance with 
embodiments of the present invention. The top portion of Figure 1 1 shows different image 
25 formats that may use used with the present invention. The image formats include: front and back 
portions of a sphere not flipped, sphere-vertical not flipped, a single hemisphere (which may also 
be a spherical representation as shown in U.S. Patent Nos. 5,684,937, 5,903,782, 5936,630 to 
Oxaal), a cube, a sphere-horizontal flipped, a sphere vertical flipped, a pair of mirrored 
hemispheres, and a cylindrical view, all collectively shown as 1 101 . 
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The input images are input into an image processing section (as described in U.S. Patent 
Application Serial No. , (Attorney Docket No. 01096.86949) entitled "Method and Apparatus for 
Providing Virtual Processing Effects for Wide-Angle Video Images"). The image processing 
section may include some or all of the following filters including a special effects filter 1 102 (for 
5 transitioning between scenes, for example, between scenes 1001 A and 100 IB). Also, video 
filters 1105 may include a radial brightness regulator that accommodates for image loss of 
brightness. Color match filter 1103 adjusts the color of the received images from the various 
cameras to account for color offsets from heat, gamma corrections, age, sensor condition, and 
other situations as are known in the art. Further, the system may include a image segment 

10 replicator to replicate pixels around a portion of an image occulted by a tripod mount or other 
platform supporting structure. Here, the replicator is shown as replacing a tripod cap 1 104. Seam 
blend 1 106 allows seams to be matched and blended as shown in PCT/US99/07667 filed April 8, 
1999. Finally, process 1107 adds an audio track that may be incorporated as audio stream 1002 
and/or MIDI stream 1011. The output of the processors results in the immersive video 

15 presentation 1108. 

Referring to Figure 10, linked hot spot stream 1005 provides and removes hot spots (links 
to other immersive streams) when appropriate. For instance, in one example, a user's selection of 
a region relating to a hot spot should only function when the object to which the hot spot links is 
in the displayed perspective corrected image. Alternatively, hot spots may be provided along the 
20 side of a screen or display irrespective of where the immersive presentation is during playback. 
In this alternative embodiment, the hot spots may act as chapter listings. 

Figure 12 shows a process for acting on the hot spot stream 1005. For reference, image 
1201 shows three homes for sale during a real estate tour as may be viewed while virtually 
driving a car. While proceeding down the street from image 1201 to 1202, houses A and B are 

25 not longer in view. In one embodiment, the hotspot linking to immersive video presentations of 
houses A and B (for example, tours of the grounds and the interior of the houses) are removed 
from the hot spots available to the viewer. Rather, only a hot spot linking to house C is available 
in image 1202. Alternatively, all hot spots may be separately accessible to a user as needed for 
example on the bottom of a displayed screen or through keyboard or related input. The operation 

30 of the hot spots is discussed below. In step 1203, a user's input is received. It is determined in 
step 1204 where the user's input is located on the image. In step 1205 it is determined if the input 
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designates a hot spot. If yes, the system transitions to a new presentation 1206. If not, the system 
continues with the original presentation 1207. As to the pay-per-view aspect of the present 
invention, the system allow one to charge per viewing of the homes on a per use basis. The tally 
for the cost for each tour may be calculated based on the number of hot spots selected. 

5 Figure 13 shows another method of deriving an income stream from the use of the 

described system. In step 1301, a user views a presentation with reception of user information 
directing the view. If a user activates the change in field of view to, for example, follow the 
movement of the game or to view alternative portions of a streamed image, the user may be 
charged for the modification. The record of charges is compiled in step 1302 and the charge to 
10 account occurring in step 1303. 

Figure 14 shows a pay-per-view system in accordance with embodiments of the present 
invention. The invention provides a pay-per-view delivery system that delivers at least a selected 
portion of video images for at least one view of the event selected by a pay-per-view user. The 
event is captured in spherical video images via multiple streaming data streams. The portion of 

15 the streaming data streams representing the view of the event selected by the pay-per-view user. 
More than one view may be selected and viewed using a plurality of windows by the user. 
Typically, the event is captured using at least one digital wide angle or fisheye lens. The pay-for- 
view delivery system includes a camera imaging system/transceiver 3002, at least one event view 
control transceiver 3004, and a display 3006. In this embodiment, the camera imaging 

20 system/transceiver includes at least two wide-angle lenses or a fisheye lens and, upon receiving 
control signals from the user selecting the at least one view of the event, simultaneously captures 
at least two partial spherical video images for the event, produces output video image signals 
corresponding to said at least two partial spherical video images, digitizing the output video 
image signals, and, where needed, the digitizer includes a seamer for seaming together said 

25 digitized output video image signals into seamless spherical video images and a memory for 
digitally goring or buffering data representing the digitized seamless spherical video images, and 
sends digitized output video image signals for the at least one portion of the multiple streaming 
data streams representing the at least one event to the event control transceiver. The memory 
may also be utilized for storing billing data. Capturing the spherical video images may be 

30 accomplished as described, for example, in United States Patent No. 6,002,430 (Method and 
Apparatus For Simultaneous Capture Of A Spherical Image by Danny A. McCall and H.Lee 
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Martin). Thus, upon capturing the spherical video images in a stream, the camera imaging 
system/transceiver digitizes and seams together, where needed, the images and sends the portion 
for the selected view to the at least one event view control transceiver. 

The at least one event view control transceiver 3004 is coupled to send control signals 
5 activated by the user selecting the at least one view of the event and to receive the digitized 
output video image signals from the camera-imaging system/transceiver 3002. The event view 
control transceiver 3004 typically is in the form of a handheld remote control 3008 and a set-top 
box 3010 coupled to a video display system such as a computer CRT, a television, a projection 
display, a high definition television, a head mounted display, a compound curve torus screen, a 

10 hemispherical dome, a spherical dome, a cylindrical screen projection, a multi-screen compound 
curve projection system, a cube cave display, or a polygon cave. However, where desired, event 
view control transceiver may have the controls in the set-top box. Where a remote control devise 
is used, the handheld remote control portion of the event view control transceiver is arranged to 
communicate with a set-top box portion of the event view control transceiver so that the user 

15 may more conveniently issue control signals to the pay-per-view delivery system and adjust the 
selected view using pan, tilt, rotate, and zoom adjustments. In one embodiment, the remote 
control portion has a touch screen with controls for the particular event shown thereon. The use 
simply inputs the location of the event (typically the channel and time), touches the desired view 
and the pan, tilt, rotate, and zoom as desired, to initiate viewing of the event at the desired view. 

20 The event view controls send control signals indicating the at least one view for the event. The 
event view control transceiver receives at least the digitized portion of the output video image 
signals that encompasses said view/views selected and uses a transformer processor to process 
the digitized portion of the output video image signals to convert the output video image signals 
representing the view/views selected to digital data representing a perspective-corrected planar 

25 image of the view/views selected. 

The display is coupled to receive and display streaming data for the perspective-corrected 
planar image of the view/views for the event in response to the control signals. The display may 
show the at least one view or a plurality of views in a plurality of windows on the screen. For 
example, one may show the front view from a platform and the side view or back view off the 
30 platform. Each window may simultaneously display a view that is simultaneously controllable by 
separate user input of any combination of pan, tilt, rotate, and zoom. 
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The event view controls may include switchable channel controls to facilitate user 
selection and viewing of alternative/additional simultaneous views as well as controls for 
implementing pan, tilt, rotate, and zoom settings. Generally billing is based on a number of 
views selected for a predetermined time period and a total viewing time utilized. Billing may be 
5 accomplished by charging an amount due on to a predetermined credit card of the user, 
automatically deducting an amount due from a bank account of the user, sending a bill for an 
amount due to the user, or the like. 

Figure 15 shows another pay-per-view system in accordance with embodiments of the 
present invention. 

10 The invention provides a method for displaying at least one view location of an event for 

a pay-per-view user utilizing streaming spherical video images. The steps of the method include: 
sequentially capturing a video stream of an event 1501, selecting at least one viewing location, 
receiving an immersive video stream regarding the at least one viewing location 1503, receiving 
a user input and correcting a selected portion for viewing 1 504. 

15 The method may further include the steps of dynamically switching/adding 1505 a 

portion of the streaming spherical video images in accordance with selecting, by the user, 
alternative/additional simultaneous view locations. The method may also include receiving user 
input regarding the new selection and perspective correcting the new portion 1506. The method 
may include the step of billing 1507 based on a number of view locations selected for the time 

20 period and, alternatively or in combination, billing for a total time viewing the image stream. 
Billing is generally implemented by charging an amount due on to a predetermined credit card of 
the user, automatically deducting an amount due from a bank account of the user, or sending a 
bill for an amount due to the user. Viewing is typically accomplished via one of: a computer 
CRT, a television, a projection display, a high definition television, a head mounted display, a 

25 compound curve torus screen hemispherical dome, a spherical dome, a cylindrical screen 
projection, a multi-screen compound curve projection system, a cube cave display, and a polygon 
cave (as are discussed in U.S. Serial No. (01096.86942) entitled "Virtual theater." 

Figure 16 shows yet another pay-per-view system in accordance with embodiments of the 
present invention. Shown schematically at 11 is a wide angle, e.g., a fisheye, lens that provides 
30 an image of the environment with a 180 degree field-of-view. The lens is attached to a camera 12 
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which converts the optical image into an electrical signal. These signals are then digitized 
electronically in an image capture unit 13 and stored in an image buffer 14 within the present 
invention. An image processing system consisting of an X-M AP and a Y-MAP processor shown 
as 16 and 17, respectively, performs the two-dimensional transform mapping. The image 
5 transform processors are controlled by the microcomputer and control interface 15. The 
microcomputer control interface provides initialization and transform parameter calculation for 
the system. The control interface also determines the desired transformation coefficients based 
on orientation angle, magnification, rotation, and light sensitivity input from an input means such 
as a joystick controller 22 or computer input means 23. The transformed image is filtered by a 2- 

10 dimensional convolution filter 28 and the output of the filtered image is stored in an output 
image buffer 29. The output image buffer 29 is scanned out by display electronics/event view 
control transceiver 20 to a video display monitor 21 for viewing. Where desired, a remote control 
24 may be arranged to receive user input to control the display monitor 21 and to send control 
signals to the event view control transceiver 29 for directing the image capture system with 

15 respect to desired view or views which the pay-per-view user wants to watch. 

The user of software may view perspectively correct smaller portions and zoom in on 
those portions from any direction as if the user were in the environment, causing a virtual reality 
experience. 

The digital processing system need not be a large computer. For example, the digital 
20 processor may comprise an ffiM/PC-compatible computer equipped with a Microsoft 
WINDOWS 95 or 98 or WINDOWS NT 4.0 or later operating system. Preferably, the system 
comprises a quad-speed or faster CD-ROM drive, although other media may be used such as 
Iomega ZIP discs or conventional floppy discs. An Apple Computer manufactured processing 
system M should have a MACINTOSH Operating System 7.5.5 or later operating system with 
25 QuickTime 3.0 software or later installed. The user should assure that there exists at least 100 
megabits of free hard disk space for operation. An Intel Pentium 133 MHz or 603c PowerPC 180 
MHz or faster processor is recommended so the captured images may be seamed together and 
stored as quickly as possible. Also, a minimum of 32 megabits of random access memory is 
recommended. 
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Image processing software is typically produced as software media and sold for loading 
on digital signal processing system. Once the software according to the present invention is 
properly installed, a user may load the digital memory of processing system with digital image 
data from digital camera system, digital audio files and global positioning data and all other data 
5 described above as desired and utilize the software to seam each two hemisphere set of digital 
images together to form DPIX images. 

Figure 1 7 shows a stadium with image capture points in accordance with embodiments of 
the present invention. Relates to another event capture system. Figure 17 depicts a sport stadium 
with event capture cameras located at points A-F. To show the flexibility of placing cameras, 
10 cameras G are placed on the top of goal posts. 

Figure 18 provides a representation of the images captured at the image capture points of 
Figure 17 in accordance with embodiments of the present invention. Figure 18 shows the 
immersive capture systems of points A-F. While the points are shown as spheres, it is readily 
appreciated that non-spherical images may be captured and used as well. For example, three 
15 cameras may be used. If the cameras have lenses of greater than 120 each, the overlapping 
portion may be discarded or used in the seaming process. 

Figure 19 shows the image capture perspectives with additional perspectives in 
accordance with embodiments of the present invention. By increasing the number of cameras 
arranged around the perimeter of the arena, the effective capture zone may be increase to a torus- 
20 like shape. Figure 19 shows the outline of the shape with more cameras disposed between points 
A-F. 

Figure 20 shows another perspective of the system of Figure 19 with a distribution 
system in accordance with embodiments of the present invention. The distribution system 
2001receives data from the various capture systems at the various viewpoints. The distribution 
25 system permits various ones of end users X, Y, and Z to view the event from the various capture 
positions. So, for example, one can view a game from the goal line every time the play occurs at 
that portion of the playing field. 

Figure 21 shows an effective field of view concentrating on a playing field in accordance 
with embodiments of the present invention. The effective field of view concentrates on the 
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playing field only in this embodiment. In particular, the effective viewing area created by the 
sum of all immersive viewing locations comprises the shape of a reverse torus. 

Figure 22 shows a system for overlaying generated images on an immersive presentation 
stream in accordance with embodiments of the present invention. Figure 22 shows a technique 

5 for adding value to an immersive presentation. An image is captured as shown in 2201. The 
system determines the location of designated elements in an image, for example, the flag 
marking the 10 yard line in football. The system may use known image analysis and matching 
techniques. The matching may be performed before or after perspective correcting a selected 
portion. Here, the system may use the detection of the designated element as the selected input 

10 control signal. The system next corrects the selected portion 2203 resulting in perspective 
corrected output 2204. The system, using similar image analysis techniques, determines the 
location of fixed information (in this example, the line markers) 2205 as shown in 2206 and 
creates an overlay 2207 to comport with the location of the designated element (the 10 yard line 
flag) and commensurate with the appropriate shape (here, parallel to the other line markers). The 

15 system next warps the overlay to fit to the shape of the original image 2201 as shown by step 

2209 and resulting in image 2210. Finally, in step 2211, the overlay is applied to the original 
image resulting in image 2212. It is appreciated that a color mask may be used to define image 

2210 so as to be transparent to all except the color of playing field 2213. Using this technique, a 
viewer would have a timely representation of the 10 yard marker despite looking in various 

20 directions as the marking line 2210 would be part of the immersive video stream shown to the 
end users. It is appreciated that the corrections may be performed before the game starts and 
have pre-stored elements 2210 ready to be applied as soon as the designated element is detected. 

Figure 23 shows an image processing system for replacing elements in accordance with 
embodiments of the present invention. Figure 23 shows another value added way of transmitting 

25 information to end users. First, in step 2301, the system locates designated elements (here, 
advertisement 2302 and hockey puck 2303). The designated elements may be found by various 
means as known in the art, including, but not limited to, a radio frequency transmitter located 
within the puck and correlated to the image as captured by an immersive capture system 2304, 
by image analysis and matching 2305, and by knowing the fixed position of an advertisement 

30 2302 in relation to an immersive video capture system. Next, a correction or replacement image 
for the elements 2302 and 2303 is pulled from a storage (not shown for simplicity) with 
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corrected images being represented by 2308 and 2309. The corrected images are warped 2310 to 
fit the distortion of the immersive video portion at which location the elements are located (to 
shapes 2311 and 2312). Finally, the waiped versions of the corrections 2311 and 2312 are 
applied to the image in step 2313 as 2314 and 2315. It is appreciated that fast moving objects 
5 may not need correction and distorting to increase video throughput of correcting images. 
Viewers may not notice the lack of correction to some elements 2315. 

Figure 24 shows a boxing ring in accordance with embodiments of the present invention. 
Here, immersive video capture systems are shown arranged around the boxing ring. The capture 
systems may be placed on a post of the ring 2401, suspended away from the ring 2403, or spaced 
10 from yet mounted to the posts 2402. Finally, a top level view may be provided of the whole ring 
2404. The system may also locate the boxers and automatically shift views to place the viewer 
closest to the opponents. 

Figure 25 shows a pay-per-view system in accordance with embodiments of the present 
invention. First, a user purchases 2501 a key. Next, the user's system applies the key 2502 to the 
15 user's viewing software that permits perspective correction of a selected portion. Next the system 
permits selected correction 2503 based on user input. As a value added, the system may permit 
tracking of action of a scene 2504. 

Figure 26 shows various image capture systems in accordance with embodiments of the 
present invention. Aerial platform 2601 may contain GPS locator 2602 and laser range finder 
20 2603. The aerial platform may comprise a helicopter or plane. The aerial platform 2601 flies 
over an area 2604 and captures immersive video images. As an alternative, the system may use a 
terrestrial based imaging system 2605 with GPS locator 2608 and laser range finder 2607. The 
system may use the stream of images captured by the immersive video capture system to 
compute a three dimensional mapping of the environment 2604. 

25 Figure 27 shows image analysis points as captured by the systems of Figure 26 in 

accordance with embodiments of the present invention. The system captures images based on a 
given frame rate. Via the GPS receiver, the system can capture the location of where the image 
was captured. As shown in Figure 27, the system can determine the location of edges and, by 
comparing perspective corrected portions of images, determine the distance to the edges. Once 

30 the two positions are known of 2701 and 2702, one may use known techniques to determine the 
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locations of objects A and B. By using a stream of images, the system may verify the location of 
objects A and B with a third immersive image 2703. This may also lead to the determination of 
the locations of objects C and D. 

Both platforms 2601 and 2608 may be used to capture images. Further, one may compute 
5 the distance between images 2701 and 2702 by knowing the velocity of the platform and the 
image capture rate. Systems disclosing object location include U.S. Patent No. 5,694,531 and 
U.S. Patent No. 6,005,984. 

Further, one may use a second platform 2606 at a different time of the day to capture a 
slightly different image set of environment 2604. By having a different position of the sun, 
10 different edges may be revealed and captured. Using this time differential method, one may find 
edges not found in one single image. Further, one may compare the two 3D models and take 
various values to determine the locations of polygons in the data sets. 

Figure 28 A shows an image 2701 taken at a first location. Figure 28B shows 2702 
captured at a second location. Figure 28C shows 2703 taken at at third location. 

15 Figure 29 shows a laser range finder and lens combination scanning between two trees. 

Moreover, as shown in Figure 30, one may use a laser range finder to determine distances 
to elements on the side of the platform. The system correlates the images to the laser range finder 
data 3001. Next, the system creates a model of the environment 3002. First the system finds 
edges 3004. Next, the system find distances to the edges 3005. Next, the system creates polygons 
20 from the edges 30O6. Next, the system paints the polygons with the colors and textures of a 
captured image 3003. 

Figures 31A-C show a plurality of applications that utilize advantages of immersive 
video in accordance with the present invention. These applications include, e.g., remote 
collaboration (teleconferencing), remote point of presence camera (web-cam, security and 
25 surveillance monitoring), transportation monitoring (traffic cam), Tele-medicine, distance 
learning, etc. 

Referring to Figure 31 A, an exemplary arrangement of the invention as used in 
teleconferencing/remote collaboration is shown. Locations A-N 3150A-3150N (where N is a 
plurality of different locations) may be configured for teleconferencing and/or remote 
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collaboration in accordance with the invention. Preferably, each location includes, e.g., an 
immersive video capture apparatus 3151A-N (as describe in this and related applications), at 
least one personal computer (PC) including display 3 1 52 A-N and/or a separate remote display 
3153A-N. The immersive video apparatus 3150 is preferably configured in a central location to 
5 capture real time immersive video images for an entire area requiring no moving parts. The 
immersive video apparatus 3151 may output captured video image signals received by a plurality 
of remote users at the remote locations 3150 via, e.g., the Internet, Intranet, or a dedicated 
teleconferencing line (e.g., an ISDN line). Using the invention, remote users can independently 
select areas of interest (in real time video) during a teleconference meeting. For example, a first 

10 remote user a location B 31 SOB can view an immersed video image captured by immersive video 
apparatus 3 151 A at location A 3150A. The immersed image can be viewed on a remote display 
3153B and/or display coupled to PC 3152B. The first remote user can select areas of interest in 
the displayed immersed image for perspective corrected video viewing. The system produces the 
equivalent of pan, tilt, zoom, and rotation within a selected view, transforming a portion of the 

15 captured video image based upon user or pre-selected commands, and producing one or more 
output images that are in correct perspective for human viewing in accordance with the user 
selections. The perspective corrected image is further provided in real time video and may be 
displayed on remote display 3153 and/or PC display 3152. A second remote user at, e.g., location 
B 31 SOB or location N 31 SON, can simultaneously view the immersed video image captured by 

20 the same immersive video apparatus 31 51 A at location A 3150A. The second user can view the 
immersed image on the remote display or on a second PC (not shown). The second remote user 
can select areas of interest in the displayed immersed image for perspective corrected video 
viewing independent of the first remote user. In this manner each user can independently view 
particular area of interest captured by the same immersive video apparatus 3 151 A without 

25 additional cameras and/or cameras conventionally requiring mechanical movements to capture 
images of particular areas of interest. PC 3153 preferably is configured with remote collaboration 
software (e.g., Collaborator by Netscape, Inc.) so that users at the plurality of locations 3150A-N 
can share information and collaborate on projects as is known. The remote collaboration 
software in combination permits plurality of users to share information and conduct remote 

30 conferences independent of other users. 
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Referring to Figure 3 I B, an exemplary arrangement of the invention as used in security 
monitoring and surveillance is shown. In a preferred arrangement, a single immersive video 
capture apparatus 3161, in accordance with the invention, is centrally installed for surveillance. 
In this arrangement, the single apparatus 3161 can be used to monitor an open area of an interior 

5 of a building, or monitor external premises, e.g., a parking lot, without requiring a plurality of 
cameras or conventionally cameras that require mechanical movements to scan areas greater than 
the field of view of the camera lens. The immersive video image captured by the immersive 
video apparatus 3161 may be transmitted to a display 3163 at remote location 3162. A user at 
remote location 3162 can view the immersed video image on display or monitor 3163. The user 

10 can select area of particular interest for viewing in perspective corrected real time video. 

Referring to Figure 31C, an exemplary arrangement of the invention as used in 
transportation monitoring (e.g., traffic cam) is shown. In this configuration, an immersive video 
apparatus 3171, in accordance with the invention, is preferably located at a traffic intersection, as 
shown. It is desirable that the immersive video apparatus 3171 is mounted in a location such that 

15 entire intersection can be monitored in immersive video using only a single camera. In 
accordance with the invention, the captured immersive video image may be received at a remote 
location and/or a plurality of remote locations. Once the immersed video mage is received, the 
user or viewer of the image can select particular areas of interest for perspective corrected 
immersive video viewing. The immersive video apparatus 3171 produces the equivalent of pan, 

20 tilt, zoom, and rotation within a selected view, transforming a portion of the video image based 
upon user or pre-selected commands, and producing one or more output images that are in 
correct perspective for human viewing in accordance with the user selections. In contrast to 
conventional techniques, that require a plurality of cameras located in each direction (in some 
case multiple cameras in each direction), the present invention preferably utilizes a single 

25 immersive video apparatus 3171 to capture immersive video images in all directions. 

Accordingly, there has been described herein a concept as well as several embodiments 
including a preferred embodiment of a pay-for-view display delivery system for delivering at 
least a selected portion of video images for an event wherein the event is captured via multiple 
streaming data streams and the delivery system delivers a display of at least one view of the 

30 event, selected by a pay-per-view user, using at least one portion of the multiple streaming data 
streams and wherein the event is captured using at least one digital wide angle/fisheye lens 



24 



WO 00/60869 



PCT/US00/09463 



Although the present invention has been described in relation to particular preferred 
embodiments thereof, many variations, equivalents, modifications and other uses will become 
apparent to those skilled in the art. It is preferred, therefore, that the present invention be limited 
not by the specific disclosure herein* but only by the appended claims. 
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CLAIMS 

We claim: 

1. A pay-for-view display delivery system for delivering at least a selected portion 
of video images for an event wherein the event is captured via multiple streaming data streams 
and the delivery system delivers a display of at least one view of the event selected by a pay-per- 
view user using at least one portion of the multiple streaming data streams and wherein the event 
is captured using at least one digital wide angle/fisheye lens comprising: 

a camera imaging system/transceiver having at least two wide-angle lenses/a fisheye lens 
for receiving control signals from the user selecting the at least one view of the event, 
simultaneously capturing at least two partial spherical video images for the event, producing 
output video image signals corresponding to said at least two partial spherical video images, 
digitizing the output video image signals, wherein, where needed, the digitizer includes a seamer 
for seaming together said digitized output video image signals into seamless spherical video 
images and a memory for digitally storing/buffering data representing said digitized seamless 
spherical video images and where selected, for storing billing data, and sending digitized output 
video image signals for the at least one portion of the multiple streaming data streams 
representing the at least one event to the event control transceiver, 

the at least one event view control transceiver, coupled to send control signals activated 
by the user selecting the at least one view of the event and to receive the digitized output video 
image signals from said camera-imaging system/transceiver, having event view controls for 
selecting and sending control signals indicating at least one view for an event and for receiving at 
least the digitized portion of the output video image signals that encompasses said view/views 
selected, wherein the event view control transceiver includes a transformer processor, responsive 
to said digitized portion of the output video image signals, for converting said output video 
image signals representing the view/views selected to digital data representing a perspective- 
corrected planar image of the view/views selected; and 

a display, coupled to receive and display streaming data for said perspective-corrected 
planar image of the view/views for the event in response to said control signals, wherein said 
display is shown on at least one window that displays the at least one view of a plurality of views 
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from said seamless spherical video images, and, wherein each window may simultaneously 
display a view is simultaneously controllable by separate user input of any combination of pan, 
tilt, rotate, and zoom. 

2. The pay-for-view display delivery system of claim 1 wherein the event view 
controls include dynamically switchable channel controls to facilitate user selection and viewing 
of alternative/additional simultaneous views. 

3. The pay-for-view display delivery system of claim 1 wherein the event view 
controls include dynamically switchable channel controls to facilitate user selection and viewing 
of alternative/simultaneous views using at least one different one of: pan, tilt, rotate, and zoom 
setting. 

4. The pay-for-view delivery system of claim 1 wherein the user is billed on a 
periodic basis based on a number of views selected for the time period and a total viewing time 
utilized. 

5. The pay-for-view delivery system of claim 4 wherein billing of the user is 
accomplished by charging an amount due on to a predetermined credit card of the user. 

6. The pay-for-view delivery system of claim 4 wherein billing of the user is 
accomplished by automatically deducting an amount due from a bank account of the user. 

7. The pay-for-view delivery system of claim 4 wherein billing of the user is 
accomplished by sending a bill for an amount due to the user. 

8. A method of displaying at least one view location of an event for a pay-per-view 
user utilizing streaming spherical video images, comprising the steps of: 

selecting, by a pay-per-view user, the at least one viewing location of the event to be 
viewed; 

sequentially capturing said streaming, by a spherical video image capturing system, 
spherical video images for the event at real-time video rates; 

receiving, by a pay-for-view user and perspective-correcting a portion of the streaming 
spherical video images that corresponds to the pay-per-view user's selecting of the at least one 
viewing location; and 
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sequentially displaying at real-time video rates, the portion of the streaming spherical 
video images that has been perspective-corrected wherein the viewing location/locations 
has/have been transformed to appear to emanate from the at least one viewing location for the 
event selected by the pay-per-view user. 

9. The method of claim 8 further including dynamically switching/adding a portion 
of the streaming spherical video images in accordance with selecting, by the user, 
alternative/additional simultaneous view locations. 

10. The method of claim 8 further including dynamically switching/altering a portion 
of the streaming spherical video images in accordance with selecting, by the user, 
alternative/additional simultaneous view locations using at least one different one of: pan, tilt, 
rotate, and zoom setting. 

11. The method of claim 8 further including the step of billing the user on a periodic 
basis based on a number of view locations selected for the time period and a total viewing time 
utilized. 

12. The method of claim 11 wherein billing of the user is accomplished by charging 
an amount due on to a predetermined credit card of the user. 

13. The method of claim 11 wherein billing of the user is accomplished by 
automatically deducting an amount due from a bank account of the user. 

14. The method of claim 1 1 wherein billing of the user is accomplished by sending a 
bill for an amount due to the user. 

15. The method of claim 8, wherein viewing is accomplished via one of: a computer 
CRT, a television, a projection display, a high definition television, a head mounted display, a 
compound curve torus screen, a hemispherical dome, a spherical dome, a cylindrical screen 
projection, a multi-screen compound curve projection system, a cube cave display, and a polygon 
cave. 

16. A computer-readable medium having computer-executable instructions for 
displaying at least one view location of an event for a pay-per-view user utilizing streaming 
spherical video images, comprising the steps of: 
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receiving information indicating selection, by a pay-per-view user, of the at least one 
viewing location of the event to be viewed; 

sequentially capturing said streaming spherical video images for the event at real-time 
video rates from a streaming spherical video capturing system; 

receiving and perspective-correcting a portion of the streaming spherical video images 
that corresponds to the pay-per-view user's selection of the at least one viewing location; and 

sequentially sending, to a display/recording device at real-time video rates, the portion of 
the streaming spherical video images that has been perspective-corrected wherein the viewing 
location/locations has/have been transformed to appear to emanate from the at least one viewing 
location for the event selected by the pay-per-view user. 

17. The computer-readable medium of claim 16 further including dynamically 
switching/adding a portion of the streaming spherical video images in accordance with selecting, 
by the user, alternative/additional simultaneous view locations. 

18. The computer-readable medium of claim 16 further including dynamically 
switching/altering a portion of the streaming spherical video images in accordance with 
selecting, by the user, alternative/additional simultaneous view locations using at least one 
different one of: pan, tilt, rotate, and zoom setting. 

19. The computer-readable medium of claim 16 further including the step of billing 
the user on a periodic basis based on a number of view locations selected for the time period and 
a total viewing time utilized. 

20. The computer-readable medium of claim 19 wherein billing of the user is 
accomplished by charging an amount due on to a predetermined credit card of the user. 

21. The computer-readable medium of claim 19 wherein billing of the user is 
accomplished by automatically deducting an amount due from a bank account of the user. 

22. The computer-readable medium of claim 19 wherein billing of the user is 
accomplished by sending a bill for an amount due to the user. 
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23. The computer-readable medium of claim 16, wherein the recording device is one 
of: a video recorder, a DVD, a CD-ROM, a magnetic tape system, an optical recorder and a 
digital recorder. 

24. A computer readable medium having computer readable instructions for 
permitting viewing of immersive video presentations comprising the steps of: 

receiving a data file containing an immersive video presentation; 

receiving a user input designating a desired direction of view; 

transforming, in real time, in response to said user input an image relating to a portion of 
said immersive video presentation. 

25. The computer readable medium according to claim 24, further comprising the step 

of: 

storing said data file in an alternate representation. 

26. A method for creating a three dimensional model of an environment, the method 
comprising the steps of: 

obtaining a first video image of the environment using a first video camera at a first 
position; 

obtaining a second video image of the environment using a second video camera at a 
second position different than the first position; 

comparing the first video image with the second video image; and 

generating a three dimensional model of the environment according to a result of the step 
of comparing. 

27. The method of claim 26, wherein the step of generating further includes 
performing edge extraction on at least one of the first and second video images. 

28. The method of claim 26, wherein the step of obtaining the first image includes 
obtaining the first video image using a first fisheye lens, and the step of obtaining the second 
image includes obtaining the second video image using a second fisheye lens. 

29. The method of claim 28, wherein the first fisheye lens is the second fisheye lens. 
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30. The method of claim 28, wherein the first video camera is the second video 
camera, and the step of obtaining the second video image includes moving the first video camera 
to the second position and obtaining the second video image using the first video camera at the 
second position. 

31. The method of claim 30, wherein the first video camera is coupled to a flying 
machine, the method further including flying machine flying so as to move the first video camera 
from the first position to the second position. 

32. The method of claim 30, wherein the camera is coupled to a platform, the 
platform moving so as to move the first video camera from the first position to the second 
position. 

33. The method of claim 26, further including painting a portion of the three 
dimensional model with a color of a corresponding portion of at least one of the first and second 
video images. 

34. The method of claim 8, wherein the step of painting includes texture-mapping the 
portion of the three dimensional model with the color of the corresponding portion of the at least 
one of the first and second video images. 

35. The method of claim 30, further including the step of measuring a distance 
between a third position associated with a position of the first video camera and a portion of the 
environment corresponding to the portion of the at least one of the first and second video images, 
the step of generating including correlating the at least one of the first and second video images 
with the distance measured and generating the three dimensional model of the environment 
based on the distance measured. 

36. The method of claim 10, further including using a laser range finder to measure 
the distance. 

37. A system for creating a three dimensional model of an environment, the system 
comprising: 

a first video camera configured to obtain a first video image of the environment from a 
first position; 



31 



WO 00/60869 



PCT/US00/09463 



a second video camera configured to obtain a second video image of the environment 
from a second position different than the first position; and 

a processor coupled to the first and second video cameras and configured to compare the 
first video image with the second video image and generate a three dimensional model of the 
environment according to the comparison. 

38. The system of claim 37, wherein the first video camera is the second video 
camera. 

39. The system of claim 38, further including a distance measuring device coupled to 
the processor and configured to measure a distance between the distance measuring device and a 
portion of the environment corresponding to the portion of the at least one of the first and second 
video images, wherein the processor is configured to correlate the at least one of the first and 
second video images with the distance measured and generate the three dimensional model based 
on the distance measured. 

40. The system of claim 39, wherein the distance measuring device comprises a laser 
range finder. 

41. The system of claim 37, wherein the processor is further configured to perform 
edge extraction on at least one of the first and second video images in order to generate the three 
dimensional model. 

42. The system of claim 37, wherein the first and second video cameras each have a 
fisheye lens through which the first and second video images are obtained. 

43. The system of claim 37, wherein the processor is further configured to paint a 
portion of the three dimensional model with a color of a corresponding portion of at least one of 
the first and second video images. 

44. The system of claim 43, wherein the processor is further configured to paint the 
portion of the three dimensional model by texture-mapping the portion of the three-dimensional 
model with the color of the corresponding portion of the at least one of the first and second video 
images. 

45. A method for creating a three dimensional model of an environment, the method 
comprising the steps of: 
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obtaining a first video image of the environment using a first video camera at a first time 
at which light is incident upon the environment at a first angle; 

obtaining a second video image of the environment using a second video camera at a 
second time different than the first time at which light is incident upon the environment at a 
second angle different from the first angle; 

comparing the first video image with the second video image; and 

generating a three dimensional model of the environment according to a result of the step 
of comparing. 

46. A method for remote collaboration at a first location of a plurality of locations and 
displaying said immersive video image with at least one user of a plurality of users at least one of 
a plurality of remote locations, the method comprising: 

capturing the immersive real time video image at the first location; 

receiving the immersive video image at least a first remote location; 

displaying the received immersive video image on a display at said first remote location; 

receiving user inputs for viewing perceptively corrected selected portions of the real time 
video image from a user at said first remote location; and 

displaying the selected portions of the real time video image as a perspective corrected 
image in real time video rates at said first location. 

47. A method for remote collaboration as recited in claim 46, further comprising: 

receiving the immersive video image at a second remote location; 

displaying the received immersive video image on a display at said second remote 
location; 

receiving user inputs for viewing perceptively corrected selected portions of the real time 
video image from a user at said second remote location, said selected portion being different 
from the selected portions selected by the user at the first remote location; and 

displaying the selected portions by the user at the second remote location as a perspective 
corrected image in real time video rates at said second location. 
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48. The method as recited in claim 47, receiving the immersive video image at said 
first location via an Internet. 

49. The method as recited in claim 47, receiving the immersive video image at said 
first location via an Intranet. 

50. The method as recited in claim 47, receiving the immersive video image at said 
second location via an Internet. 

51. The method as recited in claim 47, receiving the immersive video image at the 
second location via an Intranet. 

52. System for remote collaboration at a first location of a plurality of locations and 
displaying said immersive video image with at least one user of a plurality of users at least one of 
a plurality of remote locations, the method comprising: 

a immersive video apparatus for capturing the immersive real time video image at the 
first location, said apparatus having at least one wide angle lens; 

a first receiver for receiving the immersive video image at least a first remote location; 

a first display for displaying the received immersive video image at said first remote 
location; and 

an first input device for receiving user inputs for viewing perceptively corrected selected 
portions of the real time video image from a user at said first remote location, the display for 
further displaying the selected portions of the real time video image as a perspective corrected 
image in real time video rates at said first location. 

53. The system as recited in claim 52, further comprising: 

a second receiver for receiving the immersive video image at a second remote location; 

a second display for displaying the received immersive video image at said second 
remote location; and 

an second input device receiving user inputs for viewing perceptively corrected selected 
portions of the real time video image from a user at said second remote location, said selected 
portion being different from the selected portions selected by the user at the first remote location, 
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the display for displaying the selected portions by the user at the second remote location as a 
perspective corrected image in real time video rates at said second location. 

54. A method for real time remote surveillance comprising: 

capturing an immersive real time video surveillance image at a first location, said image 
displaying an entire region being monitored; 

receiving the immersive video surveillance image at least one remote location; 

displaying the received immersive video surveillance image on a display at said at least 
one remote location; 

receiving user inputs for viewing perceptively corrected selected portions of the region 
being monitored from a user at said at least one remote location; and 

displaying the selected portions of the real time video image of the region being 
monitored as a perspective corrected image in real time video rates at said at least one location. 

55. A method for real time remote surveillance as recited in claim 54, further 
comprising: 

receiving additional user inputs for viewing additional perceptively corrected selected 
portions of the region being monitored from said user at said first remote location, said additional 
user inputs being different said user inputs; and 

displaying the additional perceptively corrected selected portions of the region being 
monitored at the first remote location. 



35 



WO 00/60869 



1 / 27 



PCT/US00/09463 



jil r 

7 . 



15 



a 



^ to i8d* or 



'Kr- 



<?' Sensor /n&r/zc 

/ 1 1 \ . 



r 
-4 



WO 00/60869 



2 / 27 



PCT/US00/09463 




WO 00/60869 



3 / 27 



PCT/US00/09463 



_C^prvf<_ M creeds 



fmurt H 



mo\ 



rio aT / > Q I. 




U 



' — — 1 — ^ 


J 






■ — — - 













1 — 











b/ock 



1 






. ^ 







E 



A. 




LCD 



WO 00/60869 



4 / 27 



PCTYUSOO/09463 




WO 00/60869 



5 / 27 



PCT/US00/09463 




WO 00/60869 



6 / 27 



PCT7US0O/09463 




WO 00/60869 



7 / 27 



PCT/USOO/09463 




/ 

; J 



WO 00/60869 



PCT/US00/09463 



8 / 27 




WO 00/60869 



9 / 27 



PCTYUS00/O9463 




WO 00/60869 



10 / 27 



PCT/US00/09463 




11 / 27 




WO 00/60869 



12 / 27 



PCT/US00/09463 



Figure 





WO 00/60869 



13 / 27 



PCT7US00/09463 



Capture at least one 
video stream of event 



Select at least one 
viewing location 



1501 



1502 



Figure 15 



Receive immersive 
video stream re at 
least one viewing 
location 



1503 



Receive user input/ 
Correct selected 
portion 



1504 



Switch to new 
viewing location 



1505 



Bill user based on 
number of view 
locations/total time 
watched 



Receive user input/ 
Correct selected 
portion 




1506 



WO 00/60869 



14 / 27 



PCT/US00/09463 




WO 00/60869 



15 / 27 



PCT/US00/09463 




WO 00/60869 



16 I 21 



PCT/USOO/09463 




WO 00/60869 



17 / 27 



PCT/US00/09463 




WO 00/60869 



18 / 27 



PCT/US00/09463 



(r 




WO 00/60869 



PCT/US00/09463 



19 / 27 




WO 00/60869 



PCT/US00/09463 



20 / 27 




WO 00/60869 



PCT/US00/09463 



21 / 27 




WO 00/60869 



22 / 27 



PCT7US00/09463 




WO 00/60869 



23 / 27 



PCT/USOO/09463 




WO 00/60869 



24 / 27 



PCIYUSOO/09463 




WO 00/60869 



25 / 27 



PCT/USOO/09463 



3\5"0?> 




WO 00/60869 PCT/US00/09463 

26 / 27 




f 



WO 00/60869 PCT/US00/09463 

27/27 



3\C 



INTERNATIONAL SEARCH REPORT 



Intcrr nal Application No 

PCT/US 00/09463 



A. CLASSIFICATION OF SUBJECT MATTER , 

IPC 7 H04N7/18 H04N7/16 



According to International Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searcned (classification system followed by classification symbols) 

IPC 7 H04N G06F 



Documentation searched other than minimum documentation to the extent tnat such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and. where practical, search terms used) 

EPO-Internal , PAJ, WPI Data 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 0 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


US 5 185 667 A (ZIMMERMANN STEVEN D) 


1,15,16, 




9 February 1993 (1993-02-09) 


24,46, 




cited in the application 


47,52, 




54,55 




the whole document 




A 


US 5 691 765 A (KUPERSMIT CARL ET AL) 


1,8,16, 




25 November 1997 (1997-11-25) 


24,46, 






52,54 




page 2, line 19 - line 47 






column 3, line 11 -column 4, line 27; 






figure 1 




A 


WO 98 38590 A (REAL TIME BILLING INC) 


1,4-8, 




3 September 1998 (1998-09-03) 


12-14, 




16,19-22 




page 1, line 1 -page 4, line 26 






_/~ 





m 



Further documents are listed in the continuation of box C. 



ID 



Patent family members are listed in annex. 



* Special categories of cited documents : 

*A" document defining the general state of the art which is not 

considered to be of particular relevance 
"E" earlier document but published on or after the international 

filing date 

V document which may throw doubts on priority claim(s) or 
which is cited to establish the publication date ot another 
citation or other special reason (as specified) 

"O" document referring to an oral disclosure, use. exhibition or 
other means 

"P" document published prior to the international filing date but 
later than the priority date claimed 



*T" later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

"X" document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

"Y" document of particular relevance; the claimed invention 
cannot be considered to involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art 

document member of the same patent family 



Date of the actual completion ot the international search 

17 July 2000 


Date of mailing of the international search report 

24/07/2000 


Name and mailing address of the ISA 

European Patent Office. P.B. 5818 Patendaan 2 
NL - 2280 HV Rijswijk 
Tel. (+01 -70) 340-2040. Tx. 31 651 epo nl. 
Fax: (+01-70) 340^3016 


Authorized officer 

Fuchs, P 



Form PCT/ISA/210 (woond chaol) (July 1992) 



page 1 of 2 



INTERNATIONAL SEARCH REPORT 



Inten nal Application No 

PCT/US 00/09463 



C.(Contlnuotton) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category * Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



US 5 877 801 A (BUSKO NICHOLAS ET AL) 
2 March 1999 (1999-03-02) 

the whole document 

WO 97 01241 A (OMNIVIEW INC) 
9 January 1997 (1997-01-09) 
cited in the application 
abstract 

page 11, line 24 -page 13, line 15 



1,8,16, 

24,46, 

52,54 



26,28, 
37,45 



page 22, line 32 
figures 4-6A 



-page 26, line 22; 



Form PCT/ISA/210 (oondnuabon ol soccnd tfwtt) (Jiiy 1992) 



page 2 of 2 



INTERNATIONAL SEARCH REPORT 

information on patent family members 



Intefi nal Application No 

PCT/US 00/09463 



Patent document 
cited in search report 



Publication 
date 



Patent family 
member(s) 



Publication 
date 



US 5185667 



09-02-1993 



EP 
EP 
JP 
US 
US 

wo 

US 
US 

us 
us 
us 
us 



0539565 
0971540 
6501585 
5359363 
5313306 
9221208 
5384588 
5903319 
5990941 
RE36207 
5764276 
5877801 



05-05- 
12-01- 
17-02- 

25- 10- 
17-05- 

26- 11- 
24-01- 
11-05- 
23-11- 
04-05- 
09-06- 
02-03- 



1993 
2000 
1994 
■1994 
1994 
1992 
1995 
1999 
1999 
1999 
1998 
1999 



US 5691765 A 25-11-1997 



AU 


706467 


B 


17-06- 


1999 


AU 


6602096 


A 


26-02- 


•1997 


BR 


9609894 


A 


25-05- 


1999 


CA 


2220960 


A 


13-02- 


1997 


CN 


1192312 


A 


02-09- 


1998 


EP 


0842580 


A 


20-05- 


•1998 


JP 


11510341 


T 


07-09- 


1999 


WO 


9705741 


A 


13-02- 


1997 



W0 9838590 A 03-09-1998 AU 6342498 A 18-09-1998 

EP 0974113 A 26-01-2000 
US 5960416 A 28-09-1999 



US 5877801 


A 


02- 


-03-1999 


US 


5384588 


A 


24-01-1995 










US 


5359363 


A 


25-10-1994 










US 


5185667 


A 


09-02-1993 










US 


5990941 


A 


23-11-1999 










US 


5764276 


A 


09-06-1998 










US 


5903319 


A 


11-05-1999 










US 


6002430 


A 


14-12-1999 










EP 


0610863 


A 


17-08-1994 










JP 


3012142 


B 


21-02-2000 










JP 


7093526 


A 


07-04-1995 










JP 


2000083242 


A 


21-03-2000 










US 


5313306 


A 


17-05-1994 










EP 


0539565 


A 


05-05-1993 










EP 


0971540 


A 


12-01-2000 










JP 


6501585 


T 


17-02-1994 










WO 


9221208 


A 


26-11-1992 










us 


RE36207 


E 


04-05-1999 


W0 9701241 


A 


09- 


-01-1997 


us 


5990941 


A 


23-11-1999 










AU 


6386696 


A 


22-01-1997 










EP 


0834232 


A 


08-04-1998 










JP 


11508384 


T 


21-07-1999 










us 


5764276 A 


09-06-1998 










US 


6002430 


A 


14-12-1999 



Form P CTflS A/2 10 (paunt family anrws) (July 1 992) 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



CORRECTED VERSION 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
12 October 2000 (12.10.2000) 




PCT 



(10) International Publication Number 

WO 00/60869 Al 



(51) International Patent Classification 7 : H04N 7/18, 7/16 (81) Designated States (national): AE, AG, AL, AM, AT, AU, 



(21) International Application Number: PO7US00/09463 

(22) International Filing Date: 10 April 2000 (10.04.2000) 
(25) Filing Language: English 



(26) Publication Language: 

(30) Priority Data: 
60/128,613 



English 



8 April 1999 (08.04.1999) US 



(71) Applicant (for all designated States except US): INTER- 
NET PICTURES CORPORATION [US/US]; Suite 100, 
1009 Commerce Park Drive, Oak Ridge, TN 37830 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): ZIMMERMANN, 
Steven, D. [US/US]; Internet Pictures Corporation, Suite 
100, 1009 Commerce Park Drive, Oak Ridge, TN 37830 
(US). GOURLEY, Christopher, Shannon [US/US]; In- 
ternet Pictures Corporation, Suite 100, 1009 Commerce 
Park Drive, Oak Ridge, TN 37830 (US). 

(74) Agents: GLEMBOCKI, Christopher, R. etal.; Banner & 
Witcoff, Ltd., Eleventh Ploor, 1001 G. Street, N.W., Wash- 
ington, DC 20001-4597 (US). 



AZ, BA, BB, BG, BR, BY, CA, CH, CN, CR, CU, CZ, DE, 
DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, 
ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, 
LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, NO, NZ, 
PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, 
TZ, UA, UG, US, UZ, VN, YU, ZA, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, SD, SL, SZ, TZ, UG, ZW), Eurasian patent 
(AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European patent 
(AT, BE, CH, CY, DE, DK, ES, FI. FR. GB ? GR, IE, IT, LU, 
MC, NL, PT, SE), OAPI patent (BF, BJ, CF, CG, CI, CM, 
GA, GN, GW, ML, MR, NE, SN, TD, TG). 

Published: 

— with international search report 

(48) Date of publication of this corrected version: 

4 April 2002 

(15) Information about Correction: 

see PCT Gazette No. 14/2002 of 4 April 2002, Section 11 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



On 
v© 
00 

vo - ; 

O (54) Title: PERSPECTIVE-CORRECTED VIDEO PRESENTATIONS 
O 

O (5 7 ) Abstract: A system and method for capturing and presenting immersive video presentations is described. A variety of different 
£^ implementations are disclosed including multiple stream pay-per-view, sporting event coverage and 3D image modelling from the 
^ immersive video presentations. 



WO 00/60869 



PCT/US00/09463 



Immersive Video Presentations 

Related References 

This application claims the benefit of U.S. Provisional Application No. 

60/128,613, filed on April 8, 1999, which is hereby entirely incorporated herein by 

5 reference. The following disclosures are filed concurrently herewith and are expressly 

incorporated by reference for any essential material. 

r 

1. U.S. Patent Application Serial No. , (Attorney Docket No. 
01096.86946) entitled "Remote Platform for Camera". 

2. U.S. Patent Application Serial No. , (Attorney Docket No. 
10 0 1096.86942) entitled "Virtual Theater". 

3. U.S. Patent Application Serial No. , (Attorney Docket No. 
01096.86949) entitled "Method and Apparatus for Providing Virtual Processing 
Effects for Wide- Angle Video Images". 

Technical Field 

15 In general, the present invention relates to capturing and viewing images. 

More particularly, the present invention relates to capturing and viewing spherical 
images in a perspective-corrected presentation. 

Background Of the Invention 

With the advent of television and computers, man has pursued the goal of tele- 

20 presence: the perception that one is at another place. Television permits a limited form 

of tele-presence through the use of a single view of a television screen. However, one 

is continually confronted with the fact that the view provided on a television screen is 

controlled by another, primarily the camera operator. 

Using an example of a roller coaster, a television presentation of a roller 
25 coaster ride would generally start with a rider's view. However, the user cannot 
control the direction of viewing so as to see, for example, the next curve in the track. 
Accordingly, users merely see what a camera operator intends for them to see at a 
given location. 

Computer systems, through different modeling techniques, attempt to provide 
30 a virtual environment to system users. Despite advances in computing power and 
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rendering techniques permitting multi-faceted polygonal representation of objects and 
three-dimensional interaction with the objects (see, for example, first person video 
games including Half-life and Unreal), users remain wanting a more realistic 
experience. So, using the roller coaster example above, a computer system may 
5 display the roller coaster in a rendered environment, in which a user may look in 
various directions while riding the roller coaster. However, the level of detail is 
dependent on the processing power of the user's computer as each polygon must be 
separately computed for distance from the user and rendered in accordance with 
lighting and other options. Even with a computer with significant processing power, 
10 one is left with the unmistakable feeling that one is viewing a non-real environment. 

Summary 

The present invention discloses an immersive video capturing and viewing 
system. Through the capture of at least two images, the system allows for a video data 
set of an environment be captured. The immersive presentation may be streamed or 
15 stored for later viewing. Various implementation are described here including 
surveillance, pay-per-view, authoring, 3D modeling and texture mapping, and related 
implementations. 

In one embodiment, the present invention provides pay-per-view interaction 
with immersive videos. The present invention provides for the generation of a wide 

20 angle image at one location and for the transmission of a signal corresponding to that 
image to another location, with the received transmission being processed so as to 
provide a pay-per-view perspective-corrected view of any selected portion of that 
image at the other location. The present invention provides for the generation of a 
wide angle image at one location and for the transmission of a signal corresponding to 

25 that image to another location, with the received transmission being processed so as to 
provide at a plurality of stations a perspective-corrected view of any selected portion 
of that image at any pre-selected positioning with respect to the event being viewed, 
with each station/user selecting a desired perspective-corrected view that may be 
varied according to a predetermined pay-per-view scheme. 

30 The present invention provides for the generation of a wide angle image at one 

location and for the transmission of a signal corresponding to that image to a plurality 
of other locations, with the received transmission at each location being processed in 

»■ 
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accordance with pay-per-view user selections so as to provide a perspective-corrected 
view of any selected portion of that image, with the selected portion being selected at 
each of the plurality of other locations. 

Accordingly, the present invention provides an apparatus that can provide, on 
5 a pay-per-view basis, an image of any portion of the viewing space within a selected 
field-of-view without moving the apparatus to another location, and then 
electronically correct the image for visual distortions of the view. 

The present invention provides for the pay-per-view user to select the degree 
of magnification or scaling desired for the image (zooming in and out) electronically, 
10 and where desired, to provide multiple images on a plurality of windows with 
different orientations and magnification simultaneously from a single input spherical 
video image. . 

A pay-per-view system may produce the equivalent of pan, tilt, zoom, and 
rotation within a selected view, transforming a portion of the video image based upon 

15 user or pre-selected commands, and producing one or more output images that are in 
correct perspective for human viewing in accordance with the user pay-per-view 
selections. In one embodiment, the incoming image is produced by a fisheye lens that 
has a wide angle field-of-view. This image is captured into an electronic memory 
buffer. A portion of the captured image, either in real time or as prerecorded, 

20 containing a region-of-interest is transformed into a perspective corrected image by an 
image processing computer. The image processing computer provides mapping of the 
image region-of-interest into a corrected image using, for example, an orthogonal set 
of transformation algorithms. The original image may comprise a data set comprising 
all effective information captured from a point in space. Allowance is made for the 

25 platform (tripod, remote control robot, stalk supporting the lens structure, and the 
like). Further, the data set may be modified by eliminating the top and bottom 
portions as, in some instances, these regions do not contain unique material (for 
example, when straight vertical only looks at a clear sky). The data set may be stored 
in a variety of formats including equirectangular, spherical (as shown, for example, in 

30 U.S. Patent No. 5,684,937, 5,903,782, and 5,936,630 to Oxaal), cubic, bi- 
hemispherical, panoramic, and other representations as are known in the art. The 
conversion from one representation to others is within the scope of one of ordinary 
skill in the art. 
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The viewing orientation is designed by a command signal generated by either 
a human operator or computerized input. The transformed image is deposited in an 
electronic memory buffer where it is then manipulated to produce the output image or 
images as requested by the command signal. 

5 The present invention may utilize a lens supporting structure which provides 

alignment of for an image capture means wherein the alignment produces captured 
images that are aligned for easy seaming together of the captured images to form 
spherical images that are used to produce multiple streams for providing viewing of 
an event at different positions/locations by a pay-per view user. 

10 A video apparatus with that camera having at least two wide-angle lenses, 

such as a fish-eye lens with field-of-views of at least 180 degrees, produces electrical 
signals that correspond to images captured by the lenses. It is appreciated that three 
120 or more degree lenses may be used (for example, three 180 degree lenses 
producing an overlap of 60 degrees per lens), Further, four 90 or more degree lenses 

15 may be used as well. 

These electrical signals, which are distorted because of the curvature of the 
lens, are input to apparatus, digitized, and seamed together into an immersive video. 
Despite some portions being blocked by a supporting platform (for example, as 
described in concurrently filed U.S. Serial No. (01096.86946) entitled "Remote 
20 Platform for Camera", whose contents are incorporated herein, the resulting 
immersive video provides a user with the ability to navigate to a desired viewing 
location while the video is playing. 

The immersive video may have portions After creating each spherical video 
image, the apparatus may transmit a portion representing a view selected by the pay- 
25 per-view user, or alternatively, may compress each image using standard data 
compression techniques and then store the images in a magnetic medium, such as a 
hard disk, for display at real time video rates or send compressed images to the user, 
for example over a telephone line. 

At each pay-for-play location where viewing is desired, there is apparatus for 
30 receiving the transmitted signal. In the case of the telephone line transmission, 
"decompression" apparatus is included as a portion of the receiver. The received 
signal is then digitized. A selected portion of the multi-stream transmission of the 
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pay-for-play view of the event is selected by the pay-for-play viewer and a selected 
portion of the digitized signal, as selected by operator commands, is transformed 
using the algorithms of the above-cited U.S. Pat. No. 5,185,667 into a perspective- 
corrected view corresponding to that selected portion. This selection by operator 
5 commands includes options of pan, tilt, and rotation, as well as degrees of 
magnification. 

Command signals are sent by the pay-for-play user to at least a first transform 
unit to select the portion of the multi-stream transmission of the viewing event that is 
desired to be seen by the user. 

10 These and other objects of the present invention will become apparent upon 

consideration of the drawings hereinafter in combination with a complete description 
thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a block diagram of a single lens image capture system in 

15 accordance with embodiments of the present invention. 

Figure 2 shows a block diagram of a multiple lens image capture in 
accordance with embodiments of the present invention. 

Figure 3 shows a tele-centrically-opposed image capture system in accordance 
with embodiments of the present invention. 

20 Figure 4 shows an alternative image capture system in accordance with 

embodiments of the present invention. 

Figure 5 shows yet another alternative image capture system in accordance 
with embodiments of the present invention. 

Figure 6 shows a developing process flow in accordance with embodiments of 
25 the present invention. 

Figure 7 shows various image capture systems and distribution systems in 
accordance with embodiments of the present invention. 

Figure 8 shows various seaming systems in accordance with embodiments of 
the present invention. 



SUBSTITUTE SHEET (RULE 26) 



WO.00/60869 PCT/US00/O9463 

Figure 9 shows distribution systems in accordance with embodiments of the 
present invention. 

Figure 10 shows a file format in accordance with embodiments of the present 
invention. 

5 Figure 1 1 shows alternative image representation data structures in accordance 

with embodiments of the present invention. 

Figure 12 shows a temporal hotspot actuation process in accordance with 
embodiments of the present invention. 

Figure 13 shows a pay-per-view process in accordance with embodiments of 
10 the present invention. 

Figure 14 shows a pay-per-view system in accordance with embodiments of 
the present invention. 

Figure 15 shows another pay-per-view system in accordance with 
embodiments of the present invention. 

15 Figure 16 shows yet another pay-per-view system in accordance with 

embodiments of the present invention. 

Figure 17 shows a stadium with image capture points in accordance with 
embodiments of the present invention. 

Figure 18 provides a representation of the images captured at the image 
20 capture points of Figure 17 in accordance with embodiments of the present invention. 

Figure 19 shows the image capture perspectives with additional perspectives 
in accordance with embodiments of the present invention. 

Figure 20 shows another perspective of the system of Figure 19 with a 
distribution system in accordance with embodiments of the present invention. 

25 Figure 21 shows an effective field of view concentrating on a playing field in 

accordance with embodiments of the present invention. 

Figure 22 shows a system for overlaying generated images on an immersive 
presentation stream in accordance with embodiments of the present invention. 
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Figure 23 shows an image processing system for replacing elements in 
accordance with embodiments of the present invention. 

Figure 24 shows a boxing ring in accordance with embodiments of the present 
invention. 

5 Figure 25 shows a pay-per-view system in accordance with embodiments of 

the present invention. 

Figure 26 shows various image capture systems in accordance with 
embodiments of the present invention. 

Figure 27 shows image analysis points as captured by the systems of Figure 26 
10 in accordance with embodiments of the present invention. 

Figure 28 shows various images as captured with the systems of Figure 26 in 
accordance with embodiments of the present invention. 

Figure 29 shows a laser range finder with an immersive lens combination in 
accordance with embodiments of the present invention. 

15 Figure 30 shows a three-dimensional model extraction system in accordance 

with embodiments of the present invention. 

Figures 31A-C show various implementations of the system in applications in 
accordance with embodiments of the present invention. 

Detailed Description 

20 The system relates to an immersive video capture and presentation system. In 

capturing and presenting immersive video presentations, the system, through the use 
of 180 or more degree fish eye lenses, captures 360 degrees of information. As will be 
appreciated from the description, other lens combinations may be used as well 
including cameras equipped with lenses of less than 180 degrees fields of view and 

25 capturing separate images for seaming. Further, not all data needs to be captured to 
accomplish the goals of the present invention. Specifically, panoramic data sets may 
be used, as not having a top or bottom portion (e.g., top or bottom 20 degrees). 
Moreover, data sets of more than 360 degrees may be used (for example, 370 (from 
two 185 degree lenses) or 540 degrees (from three 180 degree lenses) for additional 

30 image capture. Accordingly, for simplicity, reference is made to 360 degree views or 
spherical data sets. However, it is readily appreciated, that alternative data sets or 

7 
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videos with different amounts of coverage (greater or less than) may be used equally 
as well. 

It is appreciated that all methods may be implemented in computer readable 
mediums in addition to hardware. 

5 Figure 1 shows a block diagram of a single lens image capture system in 

accordance with embodiments of the present invention. Figure 1 is a block diagram of 
one embodiment of an immersive video image capture method using a single fisheye 
lens capture system for use with the present invention. The system includes a fish-eye 
lens (which may be greater or less than 180 degrees), an image capture sensor and 

10 camera electronics, a compression interface (permitting compression to different 
standards including MPEG, MJPG, and even not compressing the file), and a 
computer system for recording and storing the resulting image. Also shown in Figure 
1 is a resulting circular image as captured by the lens. The image capture system as 
shown in Figure 1 captures images and outputs the video stream to be handled by the 

15 compression system. 

Figure 2 shows a block diagram of a multiple lens image capture in 
accordance with embodiments of the present invention. Figure 2 shows two back to 
back camera systems (as shown in U.S. Patent No. 6,002,430, which is incorporated 
by reference), a sensor interface, a seaming interface, a compression interface, and a 
20 communication interface for transmitting the received video signal onto a 
communications system. The received transmission is then stored in a capture/storage 
system. 

Figure 3 shows a tele-centricaUy-opposed image capture system in accordance 
with embodiments of the present invention. Figure 3 details a first objective lens 301 

25 and a second objective lens 302. Both objective lenses transmit their received images 
to a prism mirror 303 which reflects the image from objective lens 301 up and the 
image from objective lens 302 down. Supplemental optics 304 and 305 may then be 
used to form the images on sensors 306 and 307. An advantage to having tele- 
centrically opposed optics as shown in Figure 3 is that the linear distance between 

30 lens 301 and lens 302 may be minimized. This minimization attempts to eliminate 
non-captured regions of an environment due to the separation of the lenses. The 
resulting images are then sent to sensor interfaces 308, 309 as controlled by camera 
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dual sensor control 301. Camera dual sensor interface 310 may receive control inputs 
addressing irising among the two optical paths, color matching between the two 
images (due to, for example, color variations in the optics 301, 302, 304, 305, and in 
the sensors 306, 307), and other processing as further defined in Figure 1 1 and in U.S. 
5 Serial No. (01096.86949), referenced above. Both image streams are input into a 
seaming interface where the two images are aligned. The alignment may take the form 
of aligning the first pair, or sets of pairs and applying the correction to all remaining 
images, or at least the images contained in a captured video scene. 

The seamed video is input into compression system 312 where the video may 
10 be compressed for easier transmission. Next, the compressed video signal is input to 
communication interface block 313 where the video is prepared for transmission. The 
video is next transmitted via communication interface 314 to a communications 
network. Receiving the video from the communications network is an image capture 
system (for example, a user's computer) 315. A user specifies 316 a selected portion 
15 or portions of the video signal. The portions may comprise directions of view (as 
detailed in U.S. Patent No. 5,185,667, whose contents are expressly incorporated 
herein). The selected portion or portions may originate with a mouse, joystick, 
positional sensors on a chair, and the like as are known in the art and further including 
a head mounted display with a tracking system. The system further includes a storage 
20 317 (which may include a disk drive, RAM, ROM, tape storage, and the like). Finally, 
a display is provided as 319. The display may take the shape of the display systems as 
embodied in U.S. Serial No. (01096.86942). 

Figure 4 shows an alternative image capture system in accordance with 
embodiments of the present invention. Similar to that of Figure 3, Figure 4 shows an 
25 image capture system with a mirror prism directing images from the objective lenses 
to a common sensor interface. The sensor interface 401 may be a single sensor or a 
dual sensor. Other elements are similar to those of Figure 3. 

Figure 5 shows yet another alternative image capture system in accordance 
with embodiments of the present invention. Figure 5 shows an embodiment similar to 
30 that of Figure 4 but using light sensitive film. In this embodiment, different film sizes 
(35 mm, 16 mm, super 35mm, super 16mm and the like) may be used to capture the 
image or images from the optics. Figure 5 shows different orientations for storing 
images on the film. In particular, the images may be arranged horizontally, vertically, 
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etc. An advantage of the super 16 nun and super 35 mm film formats is that the 
approximate a 2: 1 aspect ratio. With this ratio, two circular images from the optics 
may be captured next to each other, thereby maximizing the amount of a frame of film 
used. 

5 Figure 6 shows a process flow for developing and processing the film from the 

film plane into an immersive movie. The film 601 is developed in developer 602. The 
developed film 603 is scanned by scanner 604 and the result is stored in scanner 605. 
The storage may also comprise a disk, diskette, tape, RAM or ROM 606. The images 
are seamed together and melded into an immersive presentation in 607. Finally, the 

10 output is stored in storage 608 

Figure 7 shows various image capture systems and distribution systems in 
accordance with embodiments of the present invention. Capture system cameras 701 
may represent 180 degree fish eye lenses, super 180 (233 degrees and greater) fish 
eye lenses, the various back to back image capture devices shown above, digital 

15 image capture, and film capture. The result of the image capture in 701 may be sent to 
a storage 702 for processing by authoring tools 703 and later storage 704, or may be 
streamed live 705 to a delivery/distribution system. The communication link 706 
distributes the stored information and sends it at least one file server 707 (which may 
comprise a file server for a web site) so as to distribute the information over a network 

20 709. The distribution system may comprise a unicast transmission or a multicast 708 
as these techniques of distributing data files are known in the art. The resulting 
presentations are received by network interface devices 710 and used by users. The 
network interface devices may include personal computers, set-top boxes for cable 
systems, game consoles, and the like. A user may select at least one portion of the 

25 resulting presentation with the control signals being sent to the network interface 
device to render a perspective correct view for a user. 

Instead of transmitting the presentation over a network (e.g., the Internet), the 
presentation may be separately authored or mastered 711 and placed in a fixed 
medium 712 (that may include DVDs, CD-ROMs, CD-Videos, tapes, and in solid 
30 state storage (e.g., Memory Sticks by the Sony Corporation). 

Figure 8 shows various seaming systems in accordance with embodiments of 
the present invention. Input images may comprise two or more separate images 801 A 

10 
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or combined images with two spherical images on them 801B. 801A and S01B show 
an example where lenses of greater than 180 degrees were used to capture an 
environment. Accordingly, an image boundary is shown and a 180-degree boundary is 
shown on each image. By defining the 180 degree boundary, one is able to more 
5 easily seam images as one would know where overlapping portions of the image 
being and end. Further, the resolution of the resulting image may depend on the 
sampling method used to create the representations of 801A and 801B. The 
boundaries of the image are detected in system 802. The system may also find the 
radius of the image circle. In the case, of offsets or warping to an ellipse, major and 

10 minor radii may be found. Further, from these values, the center of the image may be 
found (h,v). Next, image enhancement methods may be applied in step 803 if needed. 
The enhancement methods may include radial filtering (to remove brightness shifts as 
one moves from the center of the lens), color balancing (to account for color shifts 
due to lens color variations or sensor variations, for example, having a hot or cold 

15 gamma), flare removal (to eliminate lens flare), anti-aliasing, scaling, filtering, and 
other enhancements. Next, the boundaries of the images are matched 804 where one 
may filter or blend or match seams along the boundaries of the images. Next, the 
images are brought into registration through the registration alignment process 805. 
These and related techniques may be found in co-pending PCT Reference No, 
20 PCT/US99/07667 filed on April 8, 1999, whose disclosure is incorporated by 
reference. 

Finally, the seaming and alignment applied in step 805 is applied to die 
remaining video sequences, resulting in the immersive image output 806. 

Figure 9 shows distribution systems in accordance with embodiments of the 
25 present invention. Immersive video sequences are received at a network interface 905 
(from lens system 901 and combination interfaces 902 or storage 903 and video server 
904). The network interface outputs the image via a satellite link 906 to viewers 
(including set-top boxes, personal computers, and the like). Alternatively, the system 
may broadcast the immersive video presentation via a digital television broadcast 907 
30 to receiver (comprising, for example, set-top boxes, personal computers, and the like). 
Moreover, the immersive video experience may be transmitted via ATM, broadband, 
the Internet, and the like 908. The receiving devices may be personal computers, set- 
top boxes and the like. 

#■ 
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Likewise, global positioning system data may be captured simultaneously with 
the image or by pre-recording or post-recording the location data as is known from the 
surveying art. The object is to record the precise latitude and longitude global 
coordinates of each image as it is captured. Having such data, one can easily associate 
5 front and back hemispheres with one another for the same image set (especially when 
considered with time and date data). The path of image taking from one picture to the 
next can be permanently recorded and used, for example, to reconstruct a picture tour 
taken by a photographer when considered with the date and time of day stamps. 

Other data may be automatically recorded in memory as well (not shown) 
10 including names of human subjects, brief description of the scene, temperature, 
humidity, wind velocity, altitude and other environmental factors. These auxiliary 
digital data files associated with each image captured would only be limited in type by 
the provision of appropriate sensing and/or measuring equipment and the access to 
digital memory at the time of image capture. One or more or all of these capabilities. 
15 may be built into wide angle digital camera system. 

Figure 10 shows a file format in accordance with embodiments of the present 

invention. The file format comprises at data structure as including an immersive 

image stream 1001 and an accompanying audio stream 1002. Here, immersive image 

stream 1001 is shown with two scenes 1001A and 1001B. In one embodiment, the 

audio stream is spatially encoded. In another embodiment, the audio portion is not so 

encoded. By encoding the audio stream, the user is presented with a more immersive 

experience. However, by not encoding the stream, the amount of non-image formation 

transmitted is reduced. The technique for spatial encoding is described in greater 

detail in U.S. Serial No. (01096.86942) entitled "Virtual Theater", filed herewith and 

incorporated by reference. To minimize data content and attempt to increase image 

transfer rates, one embodiment only uses the combination of the image stream and the 

audio stream to provide the immersive experience. However, alternate embodiments 

permit the addition of additional information that enables tracking of where the 

immersive image was captured (location information 1003 including, for example, 

GPS information), enables the immersive experience to have a predefined navigation 

(auto navigation stream 1004), enables linking between immersive streams (linked hot 

spot stream 1005), enables additional information to be overlaid onto the immersive 

video stream (video overlay stream 1006), enables sprite information to be encoded 

»■ 
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(sprite stream 1007), enables visual effects to be combined on the image stream 
(visual effects stream 1008 which may incorporate transitions between scenes), enable 
position feedback information to be recorded (position feedback stream 1009), 
enables timing (time code 1010), and enhanced music to be added (MIDI stream 
5 101 1). It is appreciated that various ones of the data format fields may be added and 
removed as needed to increase or decrease the bandwidth consumed and file size of 
the immersive video presentation. 

Figure 10 also shows an embodiment where the pay-per-view embodiment of 
the present invention uses the described data format. For example, the pay-per-view 
0 embodiment allows a user to select a location for viewing an event, such as for 
example, the 20 yard line for a football game, and the delivery system isolates the 
data needed from the spherical video image that will provide a view from the selected 
location and sends it to the pay-for-view event control transceiver 2302 for viewing 
on a display 2304 by the user. The user may select a plurality of locations for viewing 
5 that may be delivered to a plurality of windows on his display. Also, the user may 
adjust a view using pan, tilt, rotate, and zoom. In addition, the viewing location may 
be associated with an object that is moving in the event. For example, by selecting the 
basketball as the location of the view, the display will place the basketball at or near 
the center of the window and will track the movement of the basketball, i.e., the 
20 window will show the basketball at or near the center of the screen and the camera 
will follow the movement of the basketball by shifting the display to maintain the 
basketball at or near the center of the screen as the basketball game proceeds. In a 
sport such as golf, the display maybe adjusted to zoom back to encompass a large area 
and place a visible screen marker on the golf ball, and where selected by the user, may 
leave a path such as is seen with "mouse tails" on a computer screen when the mouse 
is moved, to facilitate the user's viewing of the path of the golf ball. 

In short, a pay-per-view system may transmit the entire immersive 
presentation and let the user determine the direction of view and, alternatively, the 
system may transmit only a pre-selected portion of the immersive presentation for 
passive viewing by a consumer. Further, it is appreciated that a combination of both 
may be used in practice of the invention without undue experimentation. 

Figure 1 1 shows alternative image representation data structures in accordance 

with embodiments of the present invention. The top portion of Figure 1 1 shows 
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different image formats that may use used with the present invention. The image 
formats include: front and back portions of a sphere not flipped, sphere-vertical not 
flipped, a single hemisphere (which may also be a spherical representation as shown 
in U.S. Patent Nos. 5,684,937, 5,903,782, 5936,630 to Oxaal), a cube, a sphere- 
5 horizontal flipped, a sphere vertical flipped, a pair of miiTored hemispheres, and a 
cylindrical view, all collectively shown as 1 101. 

The input images are input into an image processing section (as described in 
U.S. Patent Application Serial No. , (Attorney Docket No. 01096.86949) entitled 
"Method and Apparatus for Providing Virtual Processing Effects for Wide-Angle 

10 Video Images"). The image processing section may include some or all of the 
following filters including a special effects filter 1102 (for transitioning between 
scenes, for example, between scenes 1001 A and 100 IB). Also, video filters 1105 may 
include a radial brightness regulator that accommodates for image loss of brightness. 
Color match filter 1103 adjusts the color of the received images from the various 

15 cameras to account for color offsets from heat, gamma corrections, age, sensor 
condition, and other situations as are known in the art. Further, the system may 
include a image segment replicator to replicate pixels around a portion of an image 
occulted by a tripod mount or other platform supporting structure. Here, the replicator 
is shown as replacing a tripod cap 1104. Seam blend 1106 allows seams to be 

20 matched and blended as shown in PCT/US99/07667 filed April 8, 1999. Finally, 
process 1107 adds an audio track that may be incorporated as audio stream 1002 
and/or MIDI stream 101 1. The output of the processors results in the immersive video 
presentation 1108. 

Referring to Figure 10, linked hot spot stream 1005 provides and removes hot 
25 spots (links to other immersive streams) when appropriate. For instance, in one 
example, a user's selection of a region relating to a hot spot should only function 
when the object to which the hot spot links is in the displayed perspective corrected 
image. Alternatively, hot spots may be provided along the side of a screen or display 
irrespective of where the immersive presentation is during playback. In this 
30 alternative embodiment, the hot spots may act as chapter listings. 

Figure 12 shows a process for acting on the hot spot stream 1005. For 
reference, image 1201 shows three homes for sale during a real estate tour as may be 
viewed while virtually driving a car. While proceeding down the street from image 
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1201 to 1202, houses A and B are not longer in view. In one embodiment, the hotspot 
linking to immersive video presentations of houses A and B (for example, tours of the 
grounds and the interior of the houses) are removed from the hot spots available to the 
viewer. Rather, only a hot spot linking to house C is available in image 1202. 
5 Alternatively, all hot spots may be separately accessible to a user as needed for 
example on the bottom of a displayed screen or through keyboard or related input. 
The operation of the hot spots is discussed below. In step 1203, a user's input is 
received. It is determined in step 1204 where the user's input is located on the image. 
In step 1205 it is determined if the input designates a hot spot. If yes, the system 
10 transitions to a new presentation 1206. If not, the system continues with the original 
presentation 1207. As to the pay-per-view, aspect of the present invention, the system 
allow one to charge per viewing of the homes on a per use basis. The tally for the cost 
for each tour may be calculated based on the number of hot spots selected. 

Figure 13 shows another method of deriving an income stream from the use of 
15 the described system. In step 1301, a user views a presentation with reception of user 
information directing the view. If a user activates the change in field of view to, for 
example, follow the. movement of the game or to view alternative portions of a 
streamed image, the user may be charged for the modification. The record of charges 
is compiled in step 1302 and the charge to account occurring in step 1303. 

20 Figure 14 shows a pay-per-view system in accordance with embodiments of 

the present invention. The invention provides a pay-per-view delivery system that 
delivers at least a selected portion of video images for at least one view of the event 
selected by a pay-per-view user. The event is captured in spherical video images via 
multiple streaming data streams. The portion of the streaming data streams 

25 representing the view of the event selected by the pay-per-view user. More than one 
view may be selected and viewed using a plurality of windows by the user. Typically, 
the event is captured using at least one digital wide angle or fisheye lens. The pay-for- 
view delivery system includes a camera imaging system/transceiver 3002, at least one 
event view control transceiver 3004, and a display 3006. In this embodiment, the 

30 camera imaging system/transceiver includes at least two wide-angle lenses or a 

fisheye lens and, upon receiving control signals from the user selecting the at least 

one view of the event, simultaneously captures at least two partial spherical video 

images for the event, produces output video image signals corresponding to said at 
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least two partial spherical video images, digitizing the output video image signals, 
and, where needed, the digitizer includes a seamer for seaming together said digitized 
output video image signals into seamless spherical video images and a memory for 
digitally storing or buffering data representing the digitized seamless spherical video 
5 images, and sends digitized output video image signals for the at least one portion of 
the multiple streaming data streams representing the at least one event to the event 
control transceiver. The memory may also be utilized for storing billing data. 
Capturing the spherical video images may be accomplished as described, for example, 
in United States Patent No. 6,002,430 (Method and Apparatus For Simultaneous 
10 Capture Of A Spherical Image by Danny A, McCall and H.Lee Martin). Thus, upon 
capturing the spherical video images in a stream, the camera imaging 
system/transceiver digitizes and seams together, where needed, the images and sends 
the portion for the selected view to the at least one event view control transceiver. 

The at least one event view control transceiver 3004 is coupled to send control 
15 signals activated by the user selecting the at least one view of the event and to receive 
the digitized output video image signals from the camera-imaging system/transceiver 
3002. The event view control transceiver 3004 typically is in the form of a handheld 
remote control 3008 and a set-top box 3010 coupled to a video display system such as 
a computer CRT, a television, a projection display, a high definition television, a head 
mounted display, a compound curve torus screen, a hemispherical dome, a spherical 
dome, a cylindrical screen projection, a multi-screen compound curve projection 
system, a cube cave display, or a polygon cave. However, where desired, event view 
control transceiver may have the controls in the set-top box. Where a remote control 
devise is used, the handheld remote control portion of the event view control 
transceiver is arranged to communicate with a set-top box portion of the event view 
control transceiver so that the user may more conveniently issue control signals to the 
pay-per-view delivery system and adjust the selected view using pan, tilt, rotate, and 
zoom adjustments. In one embodiment, the remote control portion has a touch screen 
with controls for the particular event shown thereon. The use simply inputs the 
location of the event (typically the channel and time), touches the desired view and 
the pan, tilt, rotate, and zoom as desired, to initiate viewing of the event at the desired 
view. The event view controls send control signals indicating the at least one view for 
the event. The event view control transceiver receives at least the digitized portion of 
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the output video image signals that encompasses said view/views selected and uses a 
transformer processor to process the digitized portion of the output video image 
signals to convert the output video image signals representing the view/views selected 
to digital data representing a perspective-corrected planar image of the view/views 
5 selected. 

The display is coupled to receive and display streaming data for the 
perspective-corrected planar image of the view/views for the event in response to the 
control signals. The display may show the at least one view or a plurality of views in a 
plurality of windows on the screen. For example, one may show the front view from a 
10 platform and the side view or back view off the platform. Each window may 
simultaneously display a view that is simultaneously controllable by separate user 
input of any combination of pan, tilt, rotate, and zoom. 

The event view controls may include switchable channel controls to facilitate 
user selection and viewing of alternative/additional simultaneous views as well as 
15 controls for implementing pan, tilt, rotate, and zoom settings. Generally billing is 
based on a number of views selected for a predetermined time period and a total 
viewing time utilized. Billing may be accomplished by charging an amount due on to 
a predetermined credit card of the user, automatically deducting an amount due from a 
bank account of the user, sending a bill for an amount due to the user, or the like. 

20 Figure 15 shows another pay-per-view system in accordance with 

embodiments of the present invention. 

The invention provides a method for displaying at least one view location of 
an event for a pay-per-view user utilizing streaming spherical video images. The steps 
of the method include: sequentially capturing a video stream of an event 1501, 
25 selecting at least one viewing location, receiving an immersive video stream regarding 
the at least one viewing location 1503, receiving a user input and correcting a selected 
portion for viewing 1504. 

The method may further include the steps of dynamically switching/adding 
1505 a portion of the streaming spherical video images in accordance with selecting, 
30 by the user, alternative/additional simultaneous view locations. The method may also 
include receiving user input regarding the new selection and perspective correcting 
the new portion 1506. The method may include the step of billing 1507 based on a 
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number of view locations selected for the time period and, alternatively or in 
combination, billing for a total time viewing the image stream. Billing is generally 
implemented by charging an amount due on to a predetermined credit card of the user, 
automatically deducting an amount due from a bank account of the user, or sending a 
5 bill for an amount due to the user. Viewing is typically accomplished via one of: a 
computer CRT, a television, a projection display, a high definition television, a head 
mounted display, a compound curve torus screen hemispherical dome, a spherical 
dome, a cylindrical screen projection, a multi-screen compound curve projection 
system, a cube cave display, and a polygon cave (as are discussed in U.S. Serial No. 
10 (01096.86942) entitled "Virtual theater." 

Figure 16 shows yet another pay-per-view system in accordance with 
embodiments of the present invention. Shown schematically at 1 1 is a wide angle, 
e.g., a fisheye, lens that provides an image of the environment with a 180 degree 
field-of-view. The lens is attached to a camera 12 which converts the optical image 

15 into an electrical signal. These signals are then digitized electronically in an image 
capture unit 13 and stored in an image buffer 14 within the present invention. An 
image processing system consisting of an X-MAP and a Y-MAP processor shown as 
16 and 17, respectively, performs the two-dimensional transform mapping. The image 
transform processors are controlled by the microcomputer and control interface 15. 

20 The microcomputer control interface provides initialization and transform parameter 
calculation for the system. The control interface also determines the desired 
transformation coefficients based on orientation angle, magnification, rotation, and 
light sensitivity input from an input means such as a joystick controller 22 or 
computer input means 23. The transfomied image is filtered by a 2-dimensional 

25 convolution filter 28 and the output of the filtered image is stored in an output image 
buffer 29. The output image buffer 29 is scanned out by display electronics/event 
view control transceiver 20 to a video display monitor 21 for viewing. Where desired, 
a remote control 24 may be arranged to receive user input to control the display 
monitor 21 and to send control signals to the event view control transceiver 29 for 

30 directing the image capture system with respect to desired view or views which the 
pay-per-view user wants to watch. 
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The user of software may view perspectively coirect smaller portions and 
zoom in on those portions from any direction as if the user were in the environment, 
causing a virtual reality experience. 

The digital processing system need not be a large computer. For example, the 
5 digital processor may comprise an IBM/PC-compatible computer equipped with a 
Microsoft WINDOWS 95 or 98 or WINDOWS NT 4.0 or later operating system. 
Preferably, the system comprises a quad-speed or faster CD-ROM drive, although 
other media may be used such as Iomega ZIP discs or conventional floppy discs. An 
Apple Computer manufactured processing system M should have a MACINTOSH 

10 Operating System 7.5.5 or later operating system with QuickTime 3.0 software or 
later installed. The user should assure that . there exists at least 100 megabits of free 
hard disk space for operation. An Intel Pentium 133 MHz or 603c PowerPC 180 MHz 
or faster processor is recommended so the captured images may be seamed together 
and stored as quickly as possible. Also, a minimum of 32 megabits of random access 

15 memory is recommended. 

Image processing software is typically produced as software media and sold 
for loading on digital signal processing system. Once the software according to the 
present invention is properly installed, a user may load the digital memory of 
processing system with digital image data from digital camera system, digital audio 
20 files and global positioning data and all other data described above as desired and 
utilize the software to seam each two hemisphere set of digital images together to 
form IPIX images. 

Figure 17 shows a stadium with image capture points in accordance with 
embodiments of the present invention. Relates to another event capture system. Figure 
25 17 depicts a sport stadium with event capture cameras located at points A-F. To show 
the flexibility of placing cameras, cameras G are placed on the top of goal posts. 

Figure 18 provides a representation of the images captured at the image 
capture points of Figure 17 in accordance with embodiments of the present invention. 
Figure 18 shows the immersive capture systems of points A-F. While the points are 
30 shown as spheres, it is readily appreciated that non-spherical images may be captured 
and used as well. For example, three cameras may be used. If the cameras have lenses 
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of greater than 120 each, the overlapping portion may be discarded or used in the 
seaming process. 

Figure 19 shows the image capture perspectives with additional perspectives 
in accordance with embodiments of the present invention. By increasing the number 
5 of cameras arranged around the perimeter of the arena, the effective capture zone may 
be increase to a torus-like shape. Figure 19 shows the outline of the shape with more 
cameras disposed between points A-F. 

Figure 20 shows another perspective of the system of Figure 19 with a 
distribution system in accordance with embodiments of the present invention. The 
10 distribution system 2001receives data from the various capture systems at the various 
viewpoints. The distribution system permits various ones of end users X, Y, and Z to 
view the event from the various capture positions. So, for example, one can view a 
game from the goal line every time the play occurs at that portion of the playing field. 

Figure 21 shows an effective field of view concentrating on a playing field in 
15 accordance with embodiments of the present invention. The effective field of view 
concentrates on the playing field only in this embodiment. In particular, the effective 
viewing area created by the sum of all immersive viewing locations comprises the 
shape of a reverse torus. 

Figure 22 shows a system for overlaying generated images on an immersive 
20 presentation stream in accordance with embodiments of the present invention. Figure 
22 shows a technique for adding value to an immersive presentation. An image is 
captured as shown in 2201. The system determines the location of designated 
elements in an image, for example, the flag marking the 10 yard line in football. The 
system may use known image analysis and matching techniques. The matching may 
25 be performed before or after perspective correcting a selected portion. Here, the 
system may use the detection of the designated element as the selected input control 
signal. The system next corrects the selected portion 2203 resulting in perspective 
corrected output 2204. The system, using similar image analysis techniques, 
determines the location of fixed information (in this example, the line markers) 2205 
30 as shown in 2206 and creates an overlay 2207 to comport with the location of the 
designated element (the 10 yard line flag) and commensurate with the appropriate 
shape (here, parallel to the other line markers). The system next warps the overlay to 

9' 

20 

SUBSTITUTE SHEET (RULE 26) 



WO 00/60869 



PCT/US00/09463 



fit to the shape of the original image 2201 as shown by step 2209 and resulting in 
image 2210. Finally, in step 2211, the overlay is applied to the original image 
resulting in image 2212. It is appreciated that a color mask may be used to define 
image 2210 so as to be transparent to all except the color of playing field 2213. Using 
5 this technique, a viewer would have a timely representation of the 10 yard marker 
despite looking in various directions as the marking line 2210 would be part of the 
immersive video stream shown to the end users. It is appreciated that the corrections 
may be performed before the game starts and have pre-stored elements 2210 ready to 
be applied as soon as the designated element is detected. 

10 Figure 23 shows an image processing system for replacing elements in 

accordance with embodiments of the present invention. Figure 23 shows another 
value added way of transmitting information to end users. First, in step 2301, the 
system locates designated elements (here, advertisement 2302 and hockey puck 
2303). The designated elements may be found by various means as known in the art, 

15 including, but not limited to, a radio frequency transmitter located within the puck and 
correlated to the image as captured by an immersive capture system 2304, by image 
analysis and matching 2305, and by knowing the fixed position of an advertisement 
2302 in relation to an immersive video capture system. Next, a correction or 
replacement image for the elements 2302 and 2303 is pulled from a storage (not 

20 shown for simplicity) with corrected images being represented by 2308 and 2309. The 
conrected images are warped 2310 to fit the distortion of the immersive video portion 
at which location the elements are located (to shapes 2311 and 2312). Finally, the 
warped versions of the corrections 2311 and 2312 are applied to the image in step 
2313 as 2314 and 2315. It is appreciated that fast moving objects may not need 

25 correction and distorting to increase video throughput of correcting images. Viewers 
may not notice the lack of correction to some elements 23 1 5. 

Figure 24 shows a boxing ring in accordance with embodiments of the present 
invention. Here, immersive video capture systems are shown arranged around the 
boxing ring. The capture systems may be placed on a post of the ring 2401, suspended 
30 away from the ring 2403, or spaced from yet mounted to the posts 2402. Finally, a top 
level view may be provided of the whole ring 2404. The system may also locate the 
boxers and automatically shift views to place the viewer closest to the opponents. 
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Figure 25 shows a pay-per-view system in accordance with embodiments of 
the present invention. First, a user purchases 2501 a key. Next, the user's system 
applies the key 2502 to the user's viewing software that permits perspective 
correction of a selected portion. Next the system permits selected correction 2503 
5 based on user input. As a value added, the system may permit tracking of action of a 
scene 2504. 

Figure 26 shows various image capture systems in accordance with 
embodiments of the present invention. Aerial platform 2601 may contain GPS locator 
2602 and laser range finder 2603. The aerial platform may comprise a helicopter or 
10 plane. The aerial platform 2601 flies over an area 2604 and captures immersive video 
images. As an alternative, the system may use a terrestrial based imaging system 2605 
with GPS locator 2608 and laser range finder 2607. The system may use the stream of 
images captured by the immersive video capture system to compute a' three 
dimensional mapping of the environment 2604. 

15 Figure 27 shows image analysis points as captured by the systems of Figure 26 

in accordance with embodiments of the present invention. The system captures 
images based on a given frame rate. Via the GPS receiver, the system can capture the 
location of where the image was captured. As shown in Figure 27, the system can 
determine the location of edges and, by comparing perspective corrected portions of 

20 images, determine the distance to the edges. Once the two positions are known of 
2701 and 2702, one may use known techniques to determine the locations of objects 
A and B. By using a stream of images, the system may verify the location of objects 
A and B with a third immersive image 2703. This may also lead to the determination 
of the locations of objects C and D. 

25 Both platforms 2601 and 2608 may be used to capture images. Further, one 

may compute the distance between images 2701 and 2702 by knowing the velocity of 
the platform and the image capture rate. Systems disclosing object location include 
U.S. Patent No. 5,694,531 and U.S. Patent No. 6,005,984. 

Further, one may use a second platform 2606 at a different time of the day to 
30 capture a slightly different image set of environment 2604. By having a different 
position of the sun, different edges may be revealed and captured. Using this time 
differential method, one may find edges not found in one single image. Further, one 
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may compare the two 3D models and take various values to determine the locations of 
polygons in the data sets. 

Figure 28A shows an image 2701 taken at a first location. Figure 28B shows 
2702 captured at a second location. Figure 28C shows 2703 taken at a third location. 

5 Figure 29 shows a laser range finder and lens combination scanning between 

two trees. 

Moreover, as shown in Figure 30, one may use a laser range finder to 
determine distances to elements on the side of the platform. The system correlates the 
images to the laser range finder data 3001. Next, the system creates a model of the 
10 environment 3002. First the system finds edges 3004. Next, the system find distances 
to the edges 3005. Next, the system creates polygons from the edges 3006. Next, the 
system paints the polygons with the colors and textures of a captured image 3003. 

Figures 31A-C show a plurality of applications that utilize advantages of 
immersive video in accordance with the present invention. These applications include, 
15 e.g., remote collaboration (teleconferencing), remote point of presence camera (web- 
cam, security and surveillance monitoring), transportation monitoring (traffic cam), 
Tele-medicine, distance learning, etc. 

Referring to Figure 3 1 A, an exemplary arrangement of the invention as used 
in teleconferencing/remote collaboration is shown. Locations A-N 3150A-3150N 

20 (where N is a plurality of different locations) may be configured for teleconferencing 
and/or remote collaboration in accordance with the invention. Preferably, each 
location includes, e.g., an immersive video capture apparatus 3 151 A-N (as describe in 
this and related applications), at least one personal computer (PC) including display 
3152A-N and/or a separate remote display 3 153 A-N. The immersive video apparatus 

25 3150 is preferably configured in a central location to capture real time immersive 
video images for an entire area requiring no moving parts. The immersive video 
apparatus 3151 may output captured video image signals received by a plurality of 
remote users at the remote locations 3150 via, e.g., the Internet, Intranet, or a 
dedicated teleconferencing line (e.g., an ISDN line). Using the invention, remote users 

30 can independently select areas of interest (in real time video) during a teleconference 
meeting. For example, a first remote user a location B 3150P can view an immersed 
video image captured by immersive video apparatus 3 151 A at location A 3 150 A. The 
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immersed image can be viewed on a remote display 3153B and/or display coupled to 
PC 3 152B. The first remote user can select areas of interest in the displayed immersed 
image for perspective corrected video viewing. The system produces the equivalent of 
pan, tilt, zoom, and rotation within a selected view, transforming a portion of the 

5 captured video image based upon user or pre-selected commands, and producing one 
or more output images that are in correct perspective for human viewing in 
accordance with the user selections. The perspective corrected image is further 
provided in real time video and may be displayed on remote display 3153 and/or PC 
display 3152. A second remote user at, e.g., location B 3150B orlocation N 3150N, 

10 can simultaneously view the immersed video image captured by the same immersive 
video apparatus 3 151 A at location A 3150A. The second user can view the immersed 
image on the remote display or on a second PC (not shown). The second remote user 
can select areas of interest in the displayed immersed image for perspective corrected 
video viewing independent of the first remote user. In this manner each user can 

15 independently view particular area of interest captured by the same immersive video 
apparatus 3 151 A without additional cameras and/or cameras conventionally requiring 
mechanical movements to capture images of particular areas of interest. PC 3153 
preferably is configured with remote collaboration software (e.g., Collaborator by 
Netscape, Inc.) so that users at the plurality of locations 3150A-N can share 

20 information and collaborate on projects as is known. The remote collaboration 
software in combination permits plurality of users to share information and conduct 
remote conferences independent of other users. 

Referring to Figure 3 IB, an exemplary arrangement of the invention as used in 
security monitoring and surveillance is shown. In a preferred arrangement, a single 

25 immersive video capture apparatus 3161, in accordance with the invention, is 
centrally installed for surveillance. In this arrangement, the single apparatus 3161 can 
be used to monitor an open area of an interior of a building, or monitor external 
premises, e.g., a parking lot, without requiring a plurality of cameras or 
conventionally cameras that require mechanical movements to scan areas greater than 

30 the field of view of the camera lens. The immersive video image captured by the 
immersive video apparatus 3161 may be transmitted to a display 3163 at remote 
location 3162. A user at remote location 3162 can view the immersed video image on 



24 



SUBSTITUTE SHEET (RULE 26) 



WO 00/60869 PCT/US00/09463 

display or monitor 3163. The user can select area of particular interest for viewing in 
perspective corrected real time video. 

Referring to Figure 3 1 C, an exemplary arrangement of the invention as used in 
transportation monitoring (e.g., traffic cam) is shown. In this configuration, an 

5 immersive video apparatus 3171, in accordance with the invention/ is preferably 
located at a traffic intersection, as shown. It is desirable that the immersive video 
apparatus 3171 is mounted in a location such that entire intersection can be monitored 
in immersive video using only a single camera. In accordance with the invention, the 
captured immersive video image may be received at a remote location and/or a 

10 plurality of remote locations. Once the immersed video mage is received, the user or 
viewer of the image can select particular areas of interest for perspective corrected 
immersive video viewing. The immersive video apparatus 3171 produces the 
equivalent of pan, tilt, zoom, and rotation within a selected view, transforming a 
portion of the video image based upon user or pre-selected commands, and producing 

15 one or more output images that are in correct perspective for human viewing in 
accordance with the user selections. In contrast to conventional techniques, that 
require a plurality of cameras located in each direction (in some case multiple 
cameras in each direction), the present invention preferably utilizes a single 
immersive video apparatus 3171 to capture immersive video images in all directions. 

20 Accordingly, there has been described herein a concept as well as several 

embodiments including a preferred embodiment of a pay-for-view display delivery 
system for delivering at least a selected portion of video images for an event wherein 
the event is captured via multiple streaming data streams and the delivery system 
delivers a display of at least one view of the event, selected by a pay-per-view user, 

25 using at least one portion of the multiple streaming data streams and wherein the 
event is captured using at least one digital wide angle/fisheye lens 

Although the present invention has been described in relation to particular 
preferred embodiments thereof, many variations, equivalents, modifications and other 
uses will become apparent to those skilled in the art. It is preferred, therefore, that the 
30 present invention be limited not by the specific disclosure herein, but only by the 
appended claims. 
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CLAIMS 

We claim: 

1. A pay-for-view display delivery system for delivering at least a 
5 selected portion of video images for an event wherein the event is captured via 
multiple streaming data streams and the delivery system delivers a display of at least 
one view of the event selected by a pay-per-view user using at least one portion of the 
multiple streaming data streams and wherein the event is captured using at least one 
digital wide angle/fisheye lens comprising: 

10 a camera imaging system/transceiver having at least two wide-angle lenses/a 

fisheye lens for receiving control signals from the user selecting the at least one view 
of the event, simultaneously capturing at least two partial spherical video images for 
the event, producing output video image signals corresponding to said at least two 
partial spherical video images, digitizing the output video image signals, wherein, 

15 where needed, the digitizer includes a seamer for seaming together said digitized 
output video image signals into seamless spherical video images and a memory for 
digitally storing/buffering data representing said digitized seamless spherical video 
images and where selected, for storing billing data, and sending digitized output video 
image signals for the at least one portion of the multiple streaming data streams 

20 representing the at least one event to the event control transceiver, 

the at least one event view control transceiver, coupled to send control signals 
activated by the user selecting the at least one view of the event and to receive the 
digitized output video image signals from said camera-imaging system/transceiver, 
having event view controls for selecting and sending control signals indicating at least 

25 one view for an event and for receiving at least the digitized portion of the output 
video image signals that encompasses said view/views selected, wherein the event 
view control transceiver includes a transformer processor, responsive to said digitized 
portion of the output video image signals, for converting said output video image 
signals representing the view/views selected to digital data representing a perspective- 

30 corrected planar image of the view/views selected; and 

a display, coupled to receive and display streaming data for said perspective- 
corrected planar image of the view/views for the event in response to said control 
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signals, wherein said display is shown on at least one window that displays the at least 
one view of a plurality of views from said seamless spherical video images, and, 
wherein each window may simultaneously display a view is simultaneously 
controllable by separate user input of any combination of pan, tilt, rotate, and zoom. 

5 2 - The pay-for-view display delivery system of claim 1 wherein the event 

view controls include dynamically switchable channel controls to facilitate user 
selection and viewing of alternative/additional simultaneous views. 

3. The pay-for-view display delivery system of claim 1 wherein the event 
view controls include dynamically switchable channel controls to facilitate user 

10 selection and viewing of alternative/simultaneous views using at least one different 
one of: pan, tilt, rotate, and zoom setting. 

4. The pay-for-view delivery system of claim 1 wherein the user is billed 
on a periodic basis based on a number of views selected for the time period and a total 
viewing time utilized. 

15 5 - The pay-for-view delivery system of claim 4 wherein billing of the 

user is accomplished by charging an amount due on to a predetermined credit card of 
the user. 

6. The pay-for-view delivery system of claim 4 wherein billing of the 
user is accomplished by automatically deducting an amount due from a bank account 

20 of the user. 

7. The pay-for-view delivery system of claim 4 wherein billing of the 
user is accomplished by sending a bill for an amount due to the user. 

8. A method of displaying at least one view location of an event for a 
pay-per-view user utilizing streaming spherical video images, comprising the steps of; 

25 selecting, by a pay-per-view user, the at least one viewing location of the 

event to be viewed; 

sequentially capturing said streaming, by a spherical video image capturing 
system, spherical video images for the event at real-time video rates; 

receiving, by a pay-for-view user and perspective-correcting a portion of the 
30 streaming spherical video images that corresponds to the pay-per-view user's 
selecting of the at least one viewing location; and 
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sequentially displaying at real-time video rates, the portion of the streaming 
spherical video images that has been perspective-corrected wherein the viewing 
location/locations has/have been transformed to appear to emanate from the at least 
one viewing location for the event selected by the pay-per-view user. 

5 9. The method of claim 8 further including dynamically switching/adding 

a portion of the streaming spherical video images in accordance with selecting, by the 
user, alternative/additional simultaneous view locations. 

10. The method of claim 8 further including dynamically 
switching/altering a portion of the streaming spherical video images in accordance 

10 with selecting, by the user, alternative/additional simultaneous view locations using at 
least one different one of: pan, tilt, rotate, and zoom setting. 

11. The method of claim 8 further including the step of billing the user on 
a periodic basis based on a number of view locations selected for the time period and 
a total viewing time utilized. 

15 12. The method of claim 1 1 wherein billing of the user is accomplished by 

charging an amount due on to a predetermined credit card of the user. 

13. The method of claim 1 1 wherein billing of the user is accomplished by 
automatically deducting an amount due from a bank account of the user. 

14. The method of claim 1 1 wherein billing of the user is accomplished by 
20 sending a bill for an amount due to the user. 

15. The method of claim 8, wherein viewing is accomplished via one of: a 
computer CRT, a television, a projection display, a high definition television, a head 
mounted display, a compound curve torus screen, a hemispherical dome, a spherical 
dome, a cylindrical screen projection, a multi-screen compound curve projection 

25 system, a cube cave display, and a polygon cave. 

16. A computer-readable medium having computer-executable instructions 
for displaying at least one view location of an event for a pay-per-view user utilizing 
streaming spherical video images, comprising the steps of: 

receiving information indicating selection, by a pay-per-view user, of the at 
30 least one viewing location of the event to be viewed; 
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sequentially capturing said streaming spherical video images for the event at 
real-time video rates from a streaming spherical video capturing system; 

receiving and perspective-correcting a portion of the streaming spherical video 
images that corresponds to the pay-per-view user's selection of the at least one 
5 viewing location; and 

sequentially sending, to a display/recording device at real-time video rates, the 
portion of the streaming spherical video images that has been perspective-corrected 
wherein the viewing location/locations has/have been transformed to appear to 
emanate from the at least one viewing location for the event selected by the pay-per- 
10 view user. 

17. The computer-readable medium of claim 16 further including 
dynamically switching/adding a portion of the streaming spherical video images in 
accordance with selecting, by the user, alternative/additional simultaneous view 
locations. 

15 18. The computer-readable medium of claim 16 further including 

dynamically switching/altering a portion of the streaming spherical video images in 
accordance with selecting, by the user, alternative/additional simultaneous view 
locations using at least one different one of: pan, tilt, rotate, and zoom setting. 

19. The computer-readable medium of claim 16 further including the step 
20 of billing the user on a periodic basis based on a number of view locations selected for 

the time period and a total viewing time utilized. 

20. The computer-readable medium of claim 1 9 wherein billing of the user 
is accomplished by charging an amount due on to a predetermined credit card of the 
user. 

25 21. The computer-readable medium of claim 19 wherein billing of the user 

is accomplished by automatically deducting an amount due from a bank account of 
the user. 

22. The computer-readable medium of claim 19 wherein billing of the user 
is accomplished by sending a bill for an amount due to the user. 
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23. The computer-readable medium of claim 16, wherein the recording 
device is one of: a video recorder, a DVD, a CD-ROM, a magnetic tape system, an 
optical recorder and a digital recorder. 

24. A computer readable medium having computer readable instructions 
5 for permitting viewing of immersive video presentations comprising the steps of: 

receiving a data file containing an immersive video presentation; 

receiving a user input designating a desired direction of view; 

transforming, in real time, in response to said user input an image relating to a 
portion of said immersive video presentation. 

10 25. The computer readable medium according to claim 24, further 

comprising the step of: 

storing said data file in an alternate representation. 

26. A method for creating a three dimensional model of an environment, 
the method comprising the steps of: 

15 obtaining a first video image of the environment using a first video camera at a 

first position; 

obtaining a second video image of the environment using a second video 
camera at a second position different than the first position; 

comparing the first video image with the second video image; and 

20 generating a three dimensional model of the environment according to a result 

of the step of comparing. 

27. The method of claim 26, wherein the step of generating further 
includes performing edge extraction on at least one of the first and second video 
images. . 

25 28. The method of claim 26, wherein the step of obtaining the first image 

includes obtaining the first video image using a first fisheye lens, and the step of 
obtaining the second image includes obtaining the second video image using a second 
fisheye lens. 
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29. The method of claim 28, wherein the first fisheye lens is the second 
fisheyelens. 

30. The method of claim 28, wherein the first video camera is the second 
. video camera, and the step of obtaining the second video image includes moving the 

5 first video camera to the second position and obtaining the second video image using 
the first video camera at the second position. 

3 1 . The method of claim ,30, wherein the first video camera is coupled to a 
flying machine, the method further including flying machine flying so as to move the 
first video camera from the first position to the second position. 

10 32. The method of claim 30, wherein the camera is coupled to a platform, 

the platform moving so as to move the first video camera from the first position to the 
second position. 

33. The method of claim 26, further including painting a portion of the 
three dimensional model with a color of a corresponding portion of at least one of the 

1 5 first and second video images. 

34. The method of claim 8, wherein the step of painting includes texture- 
mapping the portion of the three dimensional model with the color of the 
corresponding portion of the at least one of the first and second video images. 

35. The method of claim 30, further including the step of measuring a 
20 distance between a third position associated with a position of the first video camera 

and a portion of the environment corresponding to the portion of the at least one of the 
first and second video images, the step of generating including correlating the at least 
one of the first and second video images with the distance measured and generating 
the three dimensional model of the environment based on the distance measured. 

25 36. The method of claim 10, further including using a laser range finder to 

measure the distance. 

37. A system for creating a three dimensional model of an environment, 
the system comprising: 

a first video camera configured to obtain a first video image of the 
30 environment from a first position; 
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a second video camera configured to obtain a second video image of the 
environment from a second position different than the first position; and 

a processor coupled to the first and second video cameras and configured to 
compare the first video image with the second video image and generate a three 
5 dimensional model of the environment according to the comparison. 

38. The system of claim 37, wherein the first video camera is the second 
video camera. 

39. The system of claim 38, further including a distance measuring device 
coupled to the processor and configured to measure a distance between the distance 

10 measuring device and a portion of the environment corresponding to the portion of the 
at least one of the first and second video images, wherein the processor is configured 
to correlate the at least one of the first and second video images with the distance 
measured and generate the three dimensional model based on the distance measured. 

40. The system of claim 39, wherein the distance measuring device 
15 comprises a laser range finder. 

41. The system of claim 37, wherein the processor is further configured to 
perform edge extraction on at least one of the first and second video images in order 
to generate the three dimensional model. 

42. The system of claim 37, wherein the first and second video cameras 
20 each have a fisheye lens through which the first and second video images are 

obtained. 

43. The system of claim 37, wherein the processor is further configured to 
paint a portion of the three dimensional model with a color of a corresponding portion 
of at least one of the first and second video images. 

25 44. The system of claim 43, wherein the processor is further configured to 

paint the portion of the three dimensional model by texture-mapping the portion of the 
three-dimensional model with the color of the corresponding portion of the at least 
one of the first and second video images. 

45. A method for creating a three dimensional model of an environment, 
30 the method comprising the steps of: 
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obtaining a first video image of the environment using a first video camera at a 
first time at which light is incident upon the environment at a first angle; 

obtaining a second video image of the environment using a second video 
camera at a second time different than the first time at which light is incident upon the 
5 environment at a second angle different from the first angle; 

comparing the first video image with the second video image; and 

generating a three dimensional model of the environment according to a result 
of the step of comparing. 

46. A method for remote collaboration at a first location of a plurality of 
10 locations and displaying said immersive video image with at least one user of a 

plurality of users at least one of a plurality of remote locations, the method 
comprising: 

capturing the immersive real time video image at the first location; 

receiving the immersive video image at least a first remote location; 

15 displaying the received immersive video image on a display at said first 

remote location; 

receiving user inputs for viewing perceptively corrected selected portions of 
the real time video image from a user at said first remote location; and 

displaying the selected portions of the real time video image as a perspective 
20 corrected image in real time video rates at said first location. 

47. A method for remote collaboration as recited in claim 46, further 
comprising: 

receiving the immersive video image at a second remote location; 

displaying the received immersive video image on a display at said second 
25 remote location; 

receiving user inputs for viewing perceptively corrected selected portions of 
the real time video image from a user at said second remote location, said selected 
portion being different from the selected portions selected by the user at the first 
remote location; and 
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displaying the selected portions by the user at the second remote location as a 
perspective corrected image in real time video rates at said second location. 

48. The method as recited in claim 47, receiving the immersive video 
image at said first location via an Internet. 

5 49. The method as recited in claim 47, receiving the immersive video 

image at said first location via an Intranet. 

50. The method as recited in claim 47, receiving the immersive video 
image at said second location via an Internet. 

51. The method as recited in claim 47, receiving the immersive video 
10 image at the second location via an Intranet. 

52. System for remote collaboration at a first location of a plurality of 
locations and displaying said immersive video image with at least one user of a 
plurality of users at least one of a plurality of remote locations, the method 
comprising: 

15 a immersive video apparatus for capturing the immersive real time video 

image at the first location, said apparatus having at least one wide angle lens; 

a first receiver for receiving the immersive video image at least a first remote 
location; 

a first display for displaying the received immersive video image at said first 
20 remote location; and 

an first input device for receiving user inputs for viewing perceptively 
corrected selected portions of the real time video image from a user at said first 
remote location, the display for further displaying the selected portions of the real 
time video image as a perspective corrected image in real time video rates at said first 
25 location. 

53. The system as recited in claim 52, further comprising: 

a second receiver for receiving the immersive video image at a second remote 
location; 

a second display for displaying the received immersive video image at said 
30 second remote location; and 
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an second input device receiving user inputs for viewing perceptively 
corrected selected portions of the real time video image from a user at said second 
remote location, said selected portion being different from the selected portions 
selected by the user at the first remote location, the display for displaying the selected 
5 portions by the user at the second remote location as a perspective corrected image in 
real time video rates at said second location. 

54. A method for real time remote surveillance comprising: 

capturing an immersive real time video surveillance image at a first location, 
said image displaying an entire region being monitored; 

10 receiving the immersive video surveillance image at least one remote location; 

displaying the received immersive video surveillance image on a display at 
said at least one remote location; 

receiving user inputs for viewing perceptively corrected selected portions of 
the region being monitored from a user at said at least one remote location; and 

15 displaying the selected portions of the real time video image of the region 

being monitored as a perspective corrected image in real time video rates at said at 
least one location. 

55. A method for real time remote surveillance as recited in claim 54, 
further comprising: 

20 receiving additional user inputs for viewing additional perceptively corrected 

selected portions of the region being monitored from said user at said first remote 
location, said additional user inputs being different said user inputs; and 

displaying the additional perceptively corrected selected portions of the region 
being monitored at the first remote location. 

25 
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